Hacker News new | ask | show | jobs
by egypturnash 456 days ago
There’s been multiple articles on the front page of HN about how there’s a ton of AI crawlers that are really bad citizens - ignoring robots.txt, ignoring cache, re-scanning pages multiple times a day. The commons is already on fire and it’s not because of the actions of any of the “locals”.
1 comments

How is that different than non-AI crawlers doing the same for the past decade or so? Tons of businesses engage in site crawling and scrapping, and many of them are bad citizens.

My issue isn't with blocking bad-behaving bots - it's with singling out LLMs (both training and use), or worse, assuming the problem is being associated with AI and not bad bot behavior.

Volume and incentives.

Before this LLM craze, the biggest crawlers were search engines. They had a motivation to not bring down their targets, because who needs an index full of dead links. With LLM crawlers, all you need is text, and if the site is forced to shut down because of you, that's just less data for your competitors.

Also, nobody else steals your stuff like AI does. Doesn't take much thought to figure out the difference.