Hacker News new | ask | show | jobs
by scudd 2000 days ago
Cool library, I might use this for a side project I have also parsing HTML data from HackerNews.

What are your impressions of scraper and html5ever? When I initially looked at HTML/XML parsing libraries for Rust, there didn't seem to be a standout library such as serde_json for JSON data. I was also considering using scraper + html5ever. However, I'm curious if scraper adds enough to warrant the additional dependency as opposed to directly using html5ever.

1 comments

I haven't used scraper too much. I personally find the predicate approach of select.rs [0] easier to use. However in this case the selector approach just made more sense. Standalone html5ever can be a bit cumbersome to work with directly, scraper is basically an implementation of the html5ever's `TreeSink` trait, where as `select.rs` uses the hmtl5ever `RcDom` to parse the document but stores it in a more convenient way. If you look for a minimal approach you should at select.rs which basicially only depends on html5ever

[0] https://github.com/utkarshkukreti/select.rs

Hey just FYI, when you run the hackernews and explore examples without enabling the tokio feature flag, you get compilation errors about undeclared types for all the tokio stuff. I think these examples just need entires in the Cargo.toml requiring the tokio feature like the reddit example.

Once I add the tokio feature, they all run as expected.

Thanks, you're right. I fixed it just now by also requiring the tokio feature for all the tests in the Cargo.toml.