| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rkho 3810 days ago
	Completely agree. A friend and I tried to do something like this as a fun project at a hackathon, getting to 80% wasn't difficult, just a lot of parsing the DOM for articles. Dealing with things like adverts, photo captions, comments, and other text that shouldn't be in the actual article was the real pain -- especially when we wanted to detect paragraph/subheader breaks since we wanted to parse articles and text-to-speech.