Hacker News new | ask | show | jobs
by billconan 1977 days ago
my main concern is pricing. many websites use anti-scraping technologies. scraping the html doesn't work anymore. need to load everything and execute js. for example, I have seen some can detect headless / puppeteer mode too. I ended up creating my own scraping infra using vanilla chrome...

current saas platforms charge by request count. If I need to load everything, the cost will be too high.

1 comments

I thought about it too but when you consider cost of running headless Puppeteer (lets say on AWS) and the cost of a good proxy that is charged per GB its often as expensive (if not more) as some of these SaaS-es. This is the case especially for websites with some heavyweight JS/CSS/img assets.