Hacker News new | ask | show | jobs
by franga2000 1395 days ago
> for me will always be stealthiness/human like behavior no matter how crappy the dev experience is

Can't say I agree. The biggest value for me is being able to respond to site changes quickly. Having a key bot offline for an extended period of time can be costly, so being able to update, test and deploy it quickly is a big selling point. The vast majority of sites, including major companies, have very rudimentary bot detection, and a high-quality proxy provider is often all you need to bypass it.

As for the advanced methods like recaptcha 3 and cloudflare, I don't know of any framework that passes those out of the box anyways, so might as well use something that's easy to hack on and implement your own bypasses as necessary.

1 comments

We do a lot of web scraping (hundreds of millions of requests, multiple terabytes of data per month) and have been using Crawlee - previously known as Apify SDK - since its v0.20 days. We adopted it for exactly this reason. It's extremely versatile and very pleasant to build on. The combination of Node, JS and Crawlee's modular SDK offers a sweet spot for scraping that imho is light years ahead of anything else.

Helps too that the apify devs themselves are nice and super responsive (we've had quite a few PRs merged over the last couple of years). The SDK code (and supporting libs like browser-tool, got-scraping) is clean and very easy to read/follow/extend (happy to hear too that the license is going to remain unchanged).