Hacker News new | ask | show | jobs
What if documents of Pandora Papers were enriched with invisible anti-leak marks
6 points by juliet_dem 1724 days ago
What if all documents of Pandora Papers were enriched with invisible marks that can help to determine who compromised each of them? Just imagine! And not only in that case but in each situation, it will be very important to be sure that you can find the source of the leak of your own documents or documents with your personal data. Some people use watermarks, but they can be removed. Only technologies with invisible methods of steganography can really protect your documents. Do you know any solutions that can help with this?
3 comments

By definition, a truly "invisible" watermark cannot be checked. As soon as two copies of the same document differ, the presence of a watermark becomes obvious; if they don't differ, there is no watermark.
As I know there exist technologies that use not invisible watermarks but the steganographic approach and it gives them an opportunity to create millions of copies for each document that will be unique.
> As soon as two copies of the same document differ, the presence of a watermark becomes obvious;

I don't think that part is obvious. Imagine 10 photos or scans of the same page with slightly different cropping. This is quite normal: it's practically impossible to photograph or scan a document and get exactly the same file twice.

This is a naturally occurring watermark rather than an intentional one. Scanned images are unlikely to afford sufficient stealth to detect leaks (the potential leakers know that the image they scanned is unique and others have different scanned images). For the standard setup of distributing watermarked documents to people, differently scanned images are unsubtle and therefore not as useful as, say, doctoring the least significant bits of JPEG coefficients.
Naturally occurring differences are necessary if you want to embed the watermark, otherwise detecting it is trivial - you just need to compare two copies. A photo or a scan of a text document can easily weigh 2 MB, that's enough data to embed a watermark in a way that is reasonably difficult to detect. Embedding watermarks that will survive re-encoding while being hard to detect is far more difficult though.
With such a large trove of data from so many different companies i think someone or a group just hacked them. How else would you get this stash?

https://en.m.wikipedia.org/wiki/Pandora_Papers#Data_sources

you have to compare (at least) two independently sourced copies of any document to make sure it hasn't been adulterated by third-parties in-transit

attribution doesn't even have to be hidden: if it's easy enough for someone to stamp unique watermarks for every person's copy of their pdfs, it's just as easy to ++/--/randomize any field, which can just as easily reveal the source (only employee xyz had value 123.010 in field abc, everyone else had 123.01 or 0123.01)