Y
Hacker News
new
|
ask
|
show
|
jobs
by
daurnimator
3436 days ago
The best library I've found for this sort of thing is gumbo.
https://github.com/google/gumbo-parser
With its help I've created scrapers and crawlers that digest even the most disgusting HTML.