|
|
|
|
|
by sir-alien
463 days ago
|
|
Can we not just have a whitelist for allowed crawlers and ban the rest by default? Then places like DuckDuckGo and Google can provide a list of IP addresses that their crawlers will come from. Then simply just don't include major LLM providers like OpenAI |
|
Only if you operate on the scale of Cloudflare, etc. you can see which IP addresses are hitting a large number of servers in a short time span.
(I am pretty sure next they will hand out N free LLM requests per month in exchange of user machines doing the scraping if blocking gets more succesful.)
I fear the only solution in the end are CDNs, making visits expensive using challenges, or requiring users to log in.