Hacker News new | ask | show | jobs
by hnrodey 685 days ago
Nice job getting through all this. I kind of enjoy writing scrapers and browser automation in general. Browser automation is quite powerful and under explored/utilized by the average developer.

Something I learned recently, which might help your scrapers, is the ability in Playwright to sniff the network calls made through the browser (basically, programmatic API to the Network tab of the browser).

The boost is that you allow the website/webapp to make the API calls and then the scraper focuses on the data (rather than allowing the page to render DOM updates).

This approach falls apart if the page is doing server side rendering as there are no API calls to sniff.

1 comments

...or worse, if there _is_ an API call but the response is HTML instead of a json