Y
Hacker News
new
|
ask
|
show
|
jobs
by
mike_hearn
1291 days ago
Just getting plain text out of the web without getting flooded with boilerplate, noise, SEO spam, duplication, infinity pages like calendars etc is already a hard data engineering problem.