| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by microtonal 463 days ago

How do you distinguish crawlers from regular visitors using a whitelist? As stated in the article, the crawlers show up with seemingly unique IP addresses and seemingly real user agents. It's a cat and mouse game.

Only if you operate on the scale of Cloudflare, etc. you can see which IP addresses are hitting a large number of servers in a short time span.

(I am pretty sure next they will hand out N free LLM requests per month in exchange of user machines doing the scraping if blocking gets more succesful.)

I fear the only solution in the end are CDNs, making visits expensive using challenges, or requiring users to log in.