Hacker News new | ask | show | jobs
by rockdiesel 3408 days ago
http://www.ferryschedules.co

As someone who commutes via ferry, I got sick of checking the official schedule of my ferry because it's not mobile friendly and was slow. Then I noticed more of the official schedule sites are typically in pdf, slow, not optimized for mobile devices or all of the above. So I've been slowly putting some of the schedules on this site over the past week. While not fully passive, the schedules will only have to be updated a couple of times per year since they change fairly infrequently.

It is just monetized via Google Adsense right now, but already paid for itself with a couple of ad clicks and when all said and done it should be some decent beer money every month.

Next steps are to learn how to automate the monitoring of the official schedule sites, so I can automate the updating of my site to match the official schedules. That would make it more passive.

1 comments

A cursory Google search suggests you could use a package like poppler to convert the pdf to raw text, and then in theory use regex to create data your server could use and serve.

If the pdfs are published as scans like so many municipalities do, then OCR is the only way to go.

Either way, good luck and decently nice design.

I really appreciate you taking a look as well as providing your feedback.

Regarding the design, I just wanted to get something out the door quickly with a suitable look out of the box, so I decided to use MaterializeCSS (http://materializecss.com/). It's getting the job done so far, but I may revisit the design after I get all the content up.

And I'll look into poppler. Thank you for the recommendation.

If the timetables aren't particularly easy to read or parse, OCR is going to be potentially wrong so you're going to have to check it, so you might as well do it manually whilst there's no clean technical way of doing it (maybe contact the companies if you get big numbers and ask about an arrangement?). You could setup a script, on a VPS if you're doing it that way, that checks the PDF daily, and if the file changes it notifies you - that'd be fairly trivial to setup.