Hacker News new | ask | show | jobs
by arashf 5540 days ago
Hi all, Arash from Dropbox here. We understand the concern that the government could try to guess whether a particular file has been uploaded to Dropbox based on processing times and then request that Dropbox identify a user who has access to that file. However, to seek user content information, the government needs to comply with the provisions of the Electronic Communications Privacy Act by obtaining a warrant supported by probable cause (or in some cases a court order from a judge). Those safeguards protect user privacy. De-duplication does not make users any more vulnerable to intrusive government actions. Today, a government agency could ask any online service to provide the names of all users who have a particular file, whether or not the service employs de-duplication. And in that case, the government would also need to support its request with a warrant or court order. The rules that provide a check against unwarranted government snooping apply to online services equally, regardless of their back-end architecture.
5 comments

Granted, but the point of the article is that other services which do not have this ability are not vulnerable to said orders, they cannot do what they do not have the ability to do.
Yes, this is key. Tarsnap and other encrypt-then-dedup services simply cannot comply with such an order.
Still, this lets the government probe (as an ordinary user) to know that at least one prior user has a contraband file, without any warrant. That could be the thread of probable cause they use to take the next steps.
If you don't mind my asking, what is the percentage savings achieved by de-duplication across all of Dropbox? Some others here have wondered if it was premature optimization.
It wasn't a premature optimization. It was both a better experience for the user (saves bw/reuploads for the user) and was simpler to implement (can keep things in one global bucket) given we didn't want things like renames to trigger reuploads and had to use checksums as a result.
But you could prevent reuploads with per-user de-duplication, while avoiding the privacy issue of cross-user de-duplication.

I could see why this would be more work to implement (you have to key on user+contenthash), but it would still be interesting to know how much Dropbox and its users actually benefit from cross-user de-duplication.

I have a hard time understanding this line of argument.

Per-user deduplication will mean, I need not upload the same file twice into my own account? What's the use of this?

I keep some of my 'paid for' software installables backed up in my Dropbox, and they tot up to ~1.5 GB (the Humble Indie Bundle games). When I started the upload however, it took maybe 5 seconds because of cross-user deduplication, and I am super grateful to them for this feature.

I imagine this feature saves users tons of bandwidth, as most of the people I know use Dropbox for backing up important software, rare music and videos.

> I have a hard time understanding this line of argument.

It's not a line of argument. It's a line of inquiry. You've given anecdotal evidence that cross-user deduplication benefits you and people you know, but what about some actual numbers from Dropbox?

Producing actual numbers -- "eg cross-user deduplication saves our users 30% of their upload time and bandwidth, on average" -- seems like a great way for Dropbox to counter this issue.

> I imagine this feature saves users tons of bandwidth, as most of the people I know use Dropbox for backing up important software, rare music and videos.

We don't have to imagine! Let there be numbers!

Also -- "rare" music and videos that everyone's uploading duplicates of? ;)

I know quite a few people who appreciate Dropbox’s de-duplication. It’s not only about saving Dropbox bandwidth.
To the point that it does not make users any more vulnerable to intrusive government actions, as you put it -- in practice, that's not going to be true.

Let's say right now Agent Bob or RIAA layer Cindy decides they want to know everyone in the country who has a copy of a file. There's no practical way for them to do that. But now, they can upload target.iso to dropbox, see that it uploads instantly, and all they have to do is get a court order to compel Dropbox to tell them all the other users who have that file.

Every user that has that file is now exposed. Dropbox is now a single point of failure for every user's privacy, and vulnerable to attacks from any legal court order -- and we've seen what can happen with the abuse of prosecutorial and judicial power. Copyright infringement's the obvious case, but when you start to consider how this ability to fish for files could be abused, and how tempting it'll be for people try to abuse it, it seems more serious than I think you're considering.

Has your organisation considered offering "no de-duplication" to paying subscribers? Personally I have no qualms with it, but I know some might, and they could be willing to pay for it.