Hacker News new | ask | show | jobs
by showerst 247 days ago
A point orthogonal to this; consider whether you need browser automation at all.

If a website isn't using Cloudflare or a JS-only design, it's generally better to skip playwright. All the major AIs understand beautifulsoup pretty well, and they're likely to write you a faster, less brittle scraper.

3 comments

The vast majority of the modern internet falls into one of those two buckets though, no?
I mostly scrape government data so the sites are a little 'behind' on that trend, but no. Even JS heavy sites are almost always pulling from a JSON or graphql source under the hood.

At scale, dropping the heavier dependencies and network traffic of a browser is meaningful.

Yeah, reverse engineering APIs is another fantastic approach. They aren't enough if you are dealing with wizards (eg typeform), but they can work really well
IF you can use crawlers, definitely do.

They aren't enough for anything that's login-protected, or requires interacting with wizards (eg JS, downloading files, etc)

If.