Hacker News new | ask | show | jobs
by daurnimator 3436 days ago
The best library I've found for this sort of thing is gumbo. https://github.com/google/gumbo-parser

With its help I've created scrapers and crawlers that digest even the most disgusting HTML.