Hacker News new | ask | show | jobs
by ovi256 5618 days ago
I'd expect most people use these services for backups of business documents, which are almost guaranteed to be unique.

I seem to recall that Dropbox (or another well known online storage startup) implements this strategy. Maybe it works.

1 comments

Which will be tiny.

Videos, audio and photos and game media will take the bulk of the space. Of those - only photos and a proportion of the videos are likely to be totally unique to a user.

I doubt that's true for Dropbox, at least.

Consider that Dropbox gives you 50gb for the basic plan. I'm guessing most people don't back up videos, games or their OS using that space, but rather back up their documents, projects they're working on in whatever field, photos, and music.

Of those, only with music is there a chance to use deduplication, and that's assuming you can figure out two music files with different ID-tags are the same.

(Come to think of it, in my Dropbox, music easily takes up 70% of my quota, so maybe is is worthwhile after all.)

You need a decent hash of every file anyway (to check for changes etc) so it's pretty trivial to deduplicate. I don't think you'd need to do stuff like check the ID-tag.
But then my music files, which I edit the id-tags for, will show up as different than other people's when hashing.

It would be interesting for Dropbox to release numbers on how many music files are identical between different people.

I believe they could (and probably do?) de-dupe at a lower than file level to handle this issue.
Good point. The ID3 tag is probably only in the header anyway. They'll just do it at a block level.