|
|
|
|
|
by microtonal
463 days ago
|
|
How do you distinguish crawlers from regular visitors using a whitelist? As stated in the article, the crawlers show up with seemingly unique IP addresses and seemingly real user agents. It's a cat and mouse game. Only if you operate on the scale of Cloudflare, etc. you can see which IP addresses are hitting a large number of servers in a short time span. (I am pretty sure next they will hand out N free LLM requests per month in exchange of user machines doing the scraping if blocking gets more succesful.) I fear the only solution in the end are CDNs, making visits expensive using challenges, or requiring users to log in. |
|