Hacker News new | ask | show | jobs
by michelb 819 days ago
Not directly related, but how much energy is being wasted by thousands of companies scraping the internet continuously and storing roughly the same information as everyone else, and then storing that in their own datacenters? I understand the commercial reasons for it, but this all seems very inefficient.
4 comments

Nowhere near what is currently being wasted on LLMs generating the content those scrapers will soon be copying to their datacenters, or the cryptomining propping up Bitcoin.
Easy enough to fix if there's enough demand, just sell crawl logs. But maybe there's too much diversity amongst potential customers to make that business viable.
When I was at a modest sized public facing site, roughly 3/4 of requests were from bots. Most painful is over half were search result pages, which were much more costly.
The semantic web was a beautiful idea at one point