Hacker News new | ask | show | jobs
by fbouvier 505 days ago
I fully understand your concern and agree that scrapers shouldn't be hurting web servers.

I don't think they are using our browser :)

But in my opinion, blocking a browser as such is not the right solution. In this case, it's the user who should be blocked, not the browser.

1 comments

If your browser doesn't play nicely and obey robots.txt when its headless I don't think it's that crazy to block the browser and not the user.
Every tool can be used in a good or bad way, Chrome, Firefox, cURL, etc. It's not the browser who doesn't play nicely, it's the user.

It's the user's responsibility to behave well, like in life :)

The first thing that came to mind when I saw this project wasn't scraping (where I'd typically either want a less detectible browser or a more performant option), but as a browser engine that's actually sane to link against if I wanted to, e.g., write a modern TUI browser.

Banning the root library (even if you could with UA spoofing and whatnot) is right up there with banning Chrome to keep out low-wage scraping centers and their armies of employees. It's not even a little effective also risks significant collateral damage.

it is trivial to spoof user-agent, if you want to stop a motivated scraper, you need a different solution that exploits the fact that robots use headless browser
> it is trivial to spoof user-agent

It's also trivial to detect spoofed user agents via fingerprinting. The best defense against scrapers is done in layers, with user-agent name block as the bare minimum.