Hacker News new | ask | show | jobs
by jskherman 771 days ago
There's also libraries like trafilatura in Python featured here in HN some time ago that could extract content from websites to help augment the data.