Hacker News new | ask | show | jobs
by zzo38computer 449 days ago
I have temporarily disabled my HTTP server for now. (I set up port knocking for a day, but I got rid of it due to a kernel panic.)

My issue is not to prevent anyone from obtaining a copy if they want to do, and I want to ensure that users can use curl, Lynx, and other programs; I do not want to require JavaScripts, CSS, Firefox, Google, etc.

My problem is that these LLM scraping bots are badly behaved, making many requests and repeating them even though there is no good reason to do so, and potentially overloading the servers. These things are mentioned in the article. Some bots are not so badly behaved, and those are not the problem.

1 comments

How can you tell they are LLM bots?
I do not know for sure, but they are accessing with many different IP addresses, and with many different user-agent values that all include "Mozilla". I had read elsewhere that apparently they are botnets for LLM scraping.