| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by michelb 819 days ago
	Not directly related, but how much energy is being wasted by thousands of companies scraping the internet continuously and storing roughly the same information as everyone else, and then storing that in their own datacenters? I understand the commercial reasons for it, but this all seems very inefficient.

4 comments

skywhopper 818 days ago

Nowhere near what is currently being wasted on LLMs generating the content those scrapers will soon be copying to their datacenters, or the cryptomining propping up Bitcoin.

link

mike_hearn 818 days ago

Easy enough to fix if there's enough demand, just sell crawl logs. But maybe there's too much diversity amongst potential customers to make that business viable.

link

tracker1 818 days ago

When I was at a modest sized public facing site, roughly 3/4 of requests were from bots. Most painful is over half were search result pages, which were much more costly.

link

hasty_pudding 818 days ago

The semantic web was a beautiful idea at one point

link