Hacker News new | ask | show | jobs
by Crazyontap 2058 days ago
Xpath is so powerful for web scraping I just realized recently. I'd been using css selectors for my occasional scraping needs and never bothered to learn xpath until on day on a whim decided to learn at least the basics.

Man I can now write scrapers in 2 minutes that used to take me quite some time thanks to the power of xpath. Thing like ancestors, contains, the ability to chain, etc is so so powerful. I used to write so many hacks just to do the same with css before.

4 comments

I realized a couple months back that Google sheets supports using xpath to scrape web pages. So now I have a "spreadsheet" scraping a page to see when a model of laptop goes on sale. Seems to work; at least, whenever I go double check that page manually it matches the scraped result.
The only problem with the built-in IMPORTXML() function is that it doesn't execute pages with JavaScript. If you ever run into issues give API Importer a try (where I run a headless browser to execute the JavaScript): https://gsuite.google.com/marketplace/app/api_importer/52965...
Indeed, I wrote a tool[0] to make it easy to grab a page and run xpath queries on it. It’s really surprising how much mileage I’ve gotten out of it. Probably 95% of my web scraping needs can be solved withal xpath query or two. And if you realize you need selenium later, xpath is well supported there, so porting your existing query is usually quite straightforward.

0 - https://git.sr.ht/~charles/charles-util/tree/dev/bin/query-w...

Can you point out the resources you used for learning ? I wrote a lot of scrappers and am knee-deep in css hacks.
If you know CSS well, I find this useful:

https://devhints.io/xpath

The problem with xpath is that you rarely use it, so you forget how to do certain things. Then you have to go and re-learn when you need it. Rinse and repeat.

Take a look at xidel.
It is the swiss army knife of scraping indeed. I feel like I can do anything with a scraper thanks to XPath.