|
|
|
|
|
by rkho
3810 days ago
|
|
Completely agree. A friend and I tried to do something like this as a fun project at a hackathon, getting to 80% wasn't difficult, just a lot of parsing the DOM for articles. Dealing with things like adverts, photo captions, comments, and other text that shouldn't be in the actual article was the real pain -- especially when we wanted to detect paragraph/subheader breaks since we wanted to parse articles and text-to-speech. |
|