Hacker News new | ask | show | jobs
by nsonha 2059 days ago
a lot of this selectorlib (get text or attributes) is achievable with xpath 1.0, which is built-in browsers and testing tools. What I do in my scrapping framework is that it takes a dict of name -> xpath and return a json object. This way the framework knows exactly what need to be etracted and stop loading the page as soon as all information are collected.