| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fbouvier 552 days ago

I fully understand your concern and agree that scrapers shouldn't be hurting web servers.

I don't think they are using our browser :)

But in my opinion, blocking a browser as such is not the right solution. In this case, it's the user who should be blocked, not the browser.

1 comments

jjcoffman 552 days ago

If your browser doesn't play nicely and obey robots.txt when its headless I don't think it's that crazy to block the browser and not the user.

link

fbouvier 552 days ago

Every tool can be used in a good or bad way, Chrome, Firefox, cURL, etc. It's not the browser who doesn't play nicely, it's the user.

It's the user's responsibility to behave well, like in life :)

link

hansvm 551 days ago

The first thing that came to mind when I saw this project wasn't scraping (where I'd typically either want a less detectible browser or a more performant option), but as a browser engine that's actually sane to link against if I wanted to, e.g., write a modern TUI browser.

Banning the root library (even if you could with UA spoofing and whatnot) is right up there with banning Chrome to keep out low-wage scraping centers and their armies of employees. It's not even a little effective also risks significant collateral damage.

link

slt2021 552 days ago

it is trivial to spoof user-agent, if you want to stop a motivated scraper, you need a different solution that exploits the fact that robots use headless browser

link

sangnoir 552 days ago

> it is trivial to spoof user-agent

It's also trivial to detect spoofed user agents via fingerprinting. The best defense against scrapers is done in layers, with user-agent name block as the bare minimum.

link