Hacker News new | ask | show | jobs
by fortes 3802 days ago
I used to work at Flipboard, and we invested a lot in this issue. It is not easy, and requires constant (constant!) maintenance.

Getting to 80% quality isn't hard. 90% is tricky. 95% incredibly costly.

2 comments

Completely agree. A friend and I tried to do something like this as a fun project at a hackathon, getting to 80% wasn't difficult, just a lot of parsing the DOM for articles. Dealing with things like adverts, photo captions, comments, and other text that shouldn't be in the actual article was the real pain -- especially when we wanted to detect paragraph/subheader breaks since we wanted to parse articles and text-to-speech.
Good point, the constant (constant!) maintenance aspect means there would need to be a sustainable plan. On the other hand, if lots of projects started depending on the library, you'd at least get a steady supply of notifications about breakage, and perhaps fixes as well.