Hacker News new | ask | show | jobs
by nonchalantsui 456 days ago
More like having a series of fake doors and rooms built out in front of your home, with most of them leading back outside and not into the home.
3 comments

Something of a labyrinth for AI crawlers, if you will.
Nobody will want to visit your home if it has trap doors.
More like it being a store, not a house, inside being a complex maze of obnoxious ads, inhabited by performance artists who distract you so pickpockets can rob you - and because locals figured out blind people are immune to this, they started paying them to buy stuff for them, and now you retrofit the maze to have confusing tactile markings, as to direct blind people back out of the store.

The AI paranoia is getting out of hand. Worrying about bots spamming you is one thing, but discriminating on crawlers specifically because they're from AI companies - and conveniently omitting the difference between a bot that's crawling (and should obey robots.txt) vs. a bot that's acting as user agent (and should not care about robots.txt) - isn't just poisoning communication; it's setting the commons on fire.

See also: The Dog in the Manger.

There’s been multiple articles on the front page of HN about how there’s a ton of AI crawlers that are really bad citizens - ignoring robots.txt, ignoring cache, re-scanning pages multiple times a day. The commons is already on fire and it’s not because of the actions of any of the “locals”.
How is that different than non-AI crawlers doing the same for the past decade or so? Tons of businesses engage in site crawling and scrapping, and many of them are bad citizens.

My issue isn't with blocking bad-behaving bots - it's with singling out LLMs (both training and use), or worse, assuming the problem is being associated with AI and not bad bot behavior.

Volume and incentives.

Before this LLM craze, the biggest crawlers were search engines. They had a motivation to not bring down their targets, because who needs an index full of dead links. With LLM crawlers, all you need is text, and if the site is forced to shut down because of you, that's just less data for your competitors.

Also, nobody else steals your stuff like AI does. Doesn't take much thought to figure out the difference.
>it’s setting the commons on fire.

Rather than the AI companies turning up to the common pasture and starting to strip mine as fast as they can despite the protests of other commoners who were sustainably grazing their animals on it?

In the context of DDoS, they're more like over-grazing it. Should the commons be set on fire to prevent over-grazing? Technically it prevents over-grazing. In the same way that a bullet is a cure for cancer.