|
|
|
|
|
by tonyplee
3488 days ago
|
|
22 PB = total Youtube data need for last two year. 1/2 TB per user (like me) 22PB = 44,000 users. Google need 1000 times that space in their data centers to handle 44 million users. Also, I might think those 1/2 TB of data are very valuable, But only a few of them are interesting to a few of my friends, family members. They are probably very hard to monetize. Even for myself, I only browse them may a few times every a few years. If I am a PM for such product and try to propose to Alphabet to build 1000 new youtube size data center to handle only 44 millions users, I would have hard time to justify it. |
|
As for BackBlaze, also a medium-large business, they're now storing... https://www.backblaze.com/blog/200-petabytes-of-customer-dat...
Both IA and BackBlaze are private/nontraded, which means have they have lower operating capital. Diskspace is simply not that expensive now.
There's a guy on a DC++ filesharing server (find a server list - it's one of the biggest ones) who has been sharing 400TB of data for some time. Speaking of DC++, most newer clients show the total shared data for all users connected to the server you're on, and that number on some of those larger servers is usually 1-2PB.
I also saw a guy on reddit a while back who was in exactly the right place at the right time when his workplace was upgrading, and he now has a nice $200/mo electricity bill in the form of, you guessed it, 400TB of diskspace. I'm not sure if he got it all for free, but I think he may have.
So it's not a money problem; it's a space problem and a power problem. This is why flash storage is so interesting, it generates less heat and can be packed somewhat more densely, and it uses less power too. Once Flash-vs-platter hits the 49%/51% in terms of relative cost things are going to get interesting.
At the moment the major retailers are just doing simple things like firmware customizations to run their disks at lower speeds (for nearline storage) or start up with the disk off and stuff like that. Facebook's cold storage datacenters also use Reed-Solomon encoding instead of RAID/ZFS for redundancy at less used space.
I actually do think Google have actually done the kinds of allocations you speak of, using thin provisioning; after all, literally every new Google account gets 15GB of diskspace! And then there's sync profile data, whatever internal metadata is associated with the account (such as your search history), etc, that needs to be stored too.
I fully believe Google have multiple exabyte-scale datacenters. If they don't I'll be genuinely surprised.
Using thin provisioning (which is ultimately just "how much are they really using, and how can we encourage them not to use more than X") is how they manage it.
So you're right - actually provisioning enough free storage for these users would definitely be an unpleasant task. But they carefully balance what everyone uses with what they have available.