Hacker News new | ask | show | jobs
by charlesdaniels 2115 days ago
I'm surprised I don't see this discussed more in the context of web scraping, but XPath is not only much more powerful, but can also be made robust against such techniques.

Sure, if you change the page structure enough you could defeat it, but it would require more than just adding a few divs. XPath easily lets you mix and match matching against not just CSS classes, but also the page's structure itself, inner text, attributes, and so on. As a result, you can get some really powerful queries without having any kind of complex post-processing of the results.

1 comments

Xpath is one of the dinosaur technologies that I didn't learn until some time ago, and man was it a great way to find the right element, and pass that to the tool doing traversing and other things - being able to find a div that contains a string in a forest of divs was so damn nice