Hacker News new | ask | show | jobs
by draugadrotten 3657 days ago
Dropbox has been known to search users' data for files matching md5 sums someone claims they own, such as movies or known illegal photos.

http://www.extremetech.com/computing/179495-how-dropbox-know...

2 comments

There should be a feature in torrent clients which add some random bytes to videos when downloading them without braking the video itself so md5 checks would be useless for illegal content downloads
That would making seeding it impossible though as you wouldn't be able to verify the chunk is what you claim it is and eventually lead to the whole video being corrupt.
as long as the torrent client knows how to reverse the changes it made to the file (why wouldn't it?) then there won't be any problem.

Its just a really really simple form of "encryption", which most torrent clients do support

And then DropBox runs that on every file, so you you have the normal md5 and the torrent-obfuscation-reversed md5. They check both. We have now achieved nothing.
What about having a client-side only "corruption" function that is unique for each client? The file is visually the same, the md5 hash is different but when the torrent client is sending the data it just "uncorrupts" the file. When the file is received by another client, that one person's client will take care of uniquely corrupting their own version for storage.
This is not possible. The torrent protocol works by checking the hash of each file that it downloads so it can reseed the file. If these files were changed by even 1 bit then they couldn't be reseeded back into the torrent network.
Hashes, even md5, are pretty good about going nuts when even one bit is changed in the input. And video codecs (speaking very broadly) are tolerant of a bit error rate like 1e-9 or they'd be useless over the air or on optical media. So simply have your torrent client randomly flip 1 in a billion bits as it downloads. The md5 will never match and the movie quality will be unimpaired.
so if a billion people were to download it, lucky number one-billion would receive a completely garbage file.

Ok, ok. That's not statistically likely to happen. But you do have the problem then of other files being shared via bittorrent, it's not all movie files. You'd also have to re-start basically the entire BT network too, as all clients would no longer be backwards compatible - Good luck too getting every single torrent client dev to implement this at the same time!

except each client could have its own seed
If it's reversible like that it is still possible for PayPal and others to implement it too.
Its encryption, you apply a transform before putting it onto the dropbox, and then you apply a transform after you pull it back, and the key it'd be encrypted against would be per user (like most encryption)
Aren't the files matched by checksums though? I.e. you have a checksum and that's how tracker knows what you want?
I guess GP meant that torrent client could e.g. pad a file on disk (safest option) and store that metadata along with original hash. Hashes would not change over the torrent network, but would be different in other networks
They'd use some other method of fingerprinting then. It'd be an interesting arms race.
Yeah, but checking a md5 on a "list of pirated movies" database is a lot different than sending data about file uploads to a random company with no promise of what they will do with it.