Hacker News new | ask | show | jobs
by dalke 3741 days ago
I've toyed with the idea that if you buy a 1PB hard disk it will come pre-loaded with, say, a copy of archive.org and search indices, or a large collection of movies. Also, if everyone had the same reference data set then (handwaving!) it could make for some fantastic compression methods.

I don't think the economics works out, since at 1G/s it probably takes too long to load the data, and as this essay points out most people will stream what they want on demand. I also doubt there will be a standard content set which is around long enough to assure that my imagined on-the-fly compression model-building-by-corpus-reference will take root.

2 comments

> (handwaving!) it could make for some fantastic compression methods.

You could have indexes of hashes and store any chunks of anything in a big flat address space. You wouldn't even need to know what you have. Just a massive amount of archived chunks of storage. (OK, that is more than hand-waving, maybe arm-waving?).

The problem is that new content is being generated at an increasing rate.

There was an estimate in 2011 that total storage of everything everywhere by everyone was more than 250 Exabytes, increasing by around 25% annually.

There's going to be a lot of duplication in that, and a lot of it won't be public. So as a ballpark guess a complete collection of public-only sources - including all available commercial content of all possible kinds ever recorded, academic papers, Wikis, news sites, forums, and such - is going to need 25-50 Exabytes, with maybe 25% compound of new content every year.

So you could get the entire Internet delivered by truck or two, but you probably wouldn't have anywhere to put it.