Hacker News new | ask | show | jobs
by hansvm 2115 days ago
I've seen this approach backfire a bit too. Rather than having to scrape web content, my work is reduced to pulling out my favorite sandboxed JS interpreter bindings, running the snippet, and extracting the rich object they just created with exactly the data I wanted. You only need a headless browser if there's a meaningful interplay between the JS and the rest of the site.
2 comments

My favorite is when they provide JSON structures of the data in the included page JavaScript. That's easy mode scraping. :)
Haha, that'd be even better for sure.
I add a delay on the server side for IPs that seem scrappy and throw heavy javascript to blast off the resources. So far, it seems to work well in some cases.