Hacker News new | ask | show | jobs
by ern_ave 10 days ago
> 99% of the traffic now is AI scraping the sources

I wonder if we should stop fighting this and instead create an API specifically for this purpose? Or, a central repository that you could send your data to and say to anyone wanting to scrape, "safe yourself some time and just get my data from this other place"

1 comments

The thing though is that they are extremely idiotic. They are constantly, recurringly, scanning the same files, I suppose out of FOMO that a line might have changed. I don't know what a special API solves, especially because HTTP already has etags to save you from re-downloading the whole damn file over again. But these bots don't care. The extent to which they don't care is such that, after I temporarily took cgit down for kicks, they'd get 404s and still repeatedly ask for the sames files days on end.