|
|
|
|
|
by nikcub
5385 days ago
|
|
My biggest issue (beside the initial TC article being a complete shocker) was the claim of 60% saving on de-duplication and that each user only had 25GB of unique data. This research paper from Microsoft on Farsite[2] claims 'up to 50%' saving on de-dupe with a convergent file system - but that was tested against 500 computers in a corporate environment and it was done back in 2002. Users now store a lot more photos, a lot more of their own video, and any content that is DRM'd is also unique. You can save on operating system and application files, but it isn't 60%. There is nothing 'finally' about this additional information. The discussion and criticism of the claims on Twitter was knowing this information about convergent encryption and the key being derived from the content. There is a lot more that is still unanswered - such as how an 'intelligent cache' allows 'unlimited' storage to be available offline. I really wish these guys would release a research paper with their results, or include more information on their website before they make such bold claims in public. [1] http://research.microsoft.com/apps/pubs/default.aspx?id=6995... |
|
In most cases this isn't true. The computation involved in keying media on the fly while it's being downloaded is not insignificant when considered in volume. The added pain of storing everyone's unique keys also discourages this behavior. At worst you'll see different keys being used by region or datacenter, or perhaps key rotation on a weeks-months scale.
Some media (both DRM and non-DRM) will be trivially unique because of metadata like purchaser info or music tags. In some cases this makes the first block unique but all later blocks are deduped, in other cases you need to be somewhat content aware so you can treat the header data separate from the real media data. This also allows you to catch a lot of data people ripped themselves using standard settings.
You can save on operating system and application files, but it isn't 60%.
While I agree that photos and videos will be the bulk of their problem, I don't think that ruins their premise. The question is if their userbase will be significantly overweight on heavy media creators. If it's a standard distribution, I wouldn't be surprised if a majority of people were under 10gb unique and 70%+ deduped.