|
|
|
Ask HN: Have you ever used anti detect browsers for web scraping?
|
|
17 points
by DantesTravel
1313 days ago
|
|
I'm in the web scraping industry for a while and I often spend some time creating my "swiss knife" with Playwright or Selenium in case things get tough.
Thanks to a niche substack I'm following, I discovered only today the existence of anti detect browsers like GoLogin and others.
From what I see, they seem a good solution for small projects, but difficult to scale in larger ones for costs of licensing and infrastructure (most of them require a windows machine to run).
Does any of you guys smarter than me use these browsers on a large scale? How is composed your tech stack? |
|
A good trick I discovered is using webkit thru Playwright to bypass fingerprinting and related anti-bot measures. Firefox/Chrome simply leaks too much information, even with various "stealth" modifications. e.g: have been able to reliably scrape a well known companies site that implemented a "state of the art, AI-powered, behavioral analysis, etc" anti-bot product. Using Chrome/Firefox + stealth measures in Playwright did not work - simply switching to Webkit with no further modifications did the trick.
Not exactly what you're asking, but my point is, that with a little time and effort, I've usually been able to find fairly simple holes in most anti-bot measures -- it probably wouldn't be terribly hard (especially since you're versed in scraping) to build-out something similar to what you're looking to achieve without having to pay for sketchy anti-detect browsers.