Hacker News new | ask | show | jobs
by zak_mc_kracken 4385 days ago
Still not convinced by the reasons offered for client-side scraping. If I'm on my browser, I'm not interested in consuming JSON.

Scraping is really something that's better done in the back end, and today, there are a lot of libraries that let you access web sites from Java and run all the Javascript you need in order to display the page properly.

4 comments

To each their own. I'm not interested in systematic scraping. I just want to take back, take home the web experience I've had, and be able to digest and work with it latter. The things that I want to work with are the sights and experiences I've had. Client side is perfect.

Second, if I was trying to scrape, I'd rather do scraping with WebDriver than anything else, and injecting some client side scraping tools and using WebDriver as a driver, not a driver/scraper sounds remarkably better.

I see no reason to ever not use a browser to consume html content.

For example, favoriting a tweet on twitter is lossy: there's no after-the-fact scraping I can do to know where I was, what time it was when I favorited the thing.

If we want to Publish Everywhere Syndicate to Own Site (#IndieWeb dubs this PESOS), if we want to have our own experiences we can talk about, client side is the way to go.

Skeptical at first as well coming from the good ol curl/grep/sed backend scraping world, I changed my mind considering authentication issues and instructions saving: no more need to try and auth on complex websites via phantom without knowing what actually happens, I can just log in and see in my browser what I actually wanna scrape and still rerun it later as a script.

And I just loooove listening to artoo beep over and over ;)

Backend and frontend scraping just don't attend the same needs. Running backend monsters to scrape small to medium amount of data only once is such a drag when frontend scraping can take less than half an hour to perform the same task. Plus you can see the results of your code live while browsing the DOM comfortably. Finally, nobody prevents you from using artoo backend when you execute javascript.
Basically, it makes scraping accessible to almost anyone who can use a browser and write some CSS selectors.