Hacker News new | ask | show | jobs
by dkh 459 days ago
> The broader context is that most LLM scrapers are a plague, and I cannot wait for this bubble to pop.

My thing is, there are legitimate uses for automated browsing, uses that could be extremely useful and yet nondamaging to (or even supported by!) site operators. But we'll never get to have them if the tools/methods to implement them are the same as the ones used by people inadvertently DDOSing the site they're trying to inhale the entire contents of. For them to not get lumped together, purveyors of the tools cannot remain "neutral" or hide implementation details or condone, whether explicit or implicit, bad behavior. We've seen this happen on the web before, and we're already seeing desperate organizations implement nuclear-option LLM scraper blockers that also take out things like RSS readers. Anyways... I may just have to write something about this...

> So, amusingly, they seem to have added a "ethical scraping" page[1] to their docs in between me looking at this a few hours ago and now.

Congrats, you made an impact :)