Hacker News new | ask | show | jobs
by simondotau 1403 days ago
I know many people’s immediate reaction is that 500GB seems small, but remember, they’re not storing the whole internet — just an index of it.

The whole internet would need a much larger RAID, with literally dozens of hard disks.

2 comments

To be fair, this is not actually all that far off the truth. Do napkin math on how much storage you'd need to index say 25 billion documents, and you'll see it's not actually a lot. 25 billion bytes is 25 Gb. How many bytes do you need per document (for the index)? A kilobyte, maybe? If so that's 25 Tb. While certainly not something you can fit on a thumbdrive, it's hardly something you need a planetscale computer cluster to store.

You may need it for the IOPS of actually using it, but that's another thing entirely.

They kind of do store the internet though, don't they? They store a cached version of most pages.
Tbh the source code of the whole internet would probably be a few PB at most, text is really cheap to store especially because it can be compressed. Images and videos are what makes the premise impossible because even with perfect compression you need an impossible amount of storage to store every video published by mankind.
Well, most of that is on youtube, so they kinda have a copy anyway :)
Yeah but pages are HTML and HTML compresses extremely well. With the latest algorithms you could probably get as low as one byte per page. Probably even better with a decent middle-out compression algorithm.

(Also yes, you are correct within the realm of reality, but not within the realm of comedy.)

https://en.m.wikipedia.org/wiki/Yes,_and

Just take the “middle” out if the web. If it is more than 50kb excluding images but including css and scripts, just ignore that page.
Should be noted HTML compresses extremely well.
∞ * .1 = ∞

But the 25TB you showed above is a better prospect.