Hacker News new | ask | show | jobs
by bialecki 5324 days ago
One of my biggest pet peeves with crawling the web is using XPath. Not because I have strong feelings about XPath, just that I use css selector syntax so much, it's a pain I can't leverage that knowledge in this domain as well. Something like this is really awesome and going to make crawling the web more accessible.
2 comments

If you are using Java to crawl the web , I would suggest using Jsoup for data extraction -- you can extract data with jquery like methods.
if you're using python, lxml has a cssselect module that makes this a breeze.
Very interesting, I'll definitely look into that. Thanks!