Y
Hacker News
new
|
ask
|
show
|
jobs
by
jskherman
771 days ago
There's also libraries like trafilatura in Python featured here in HN some time ago that could extract content from websites to help augment the data.