| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by matheusmoreira 1603 days ago
	These probably wouldn't survive extraction of the pure text, would they?

3 comments

out_of_protocol 1603 days ago

Replacing characters with identical-looking unicode chars, adding extra spaces here and there, adding newlines (and more spaces :)), adding random typos, use dictionary with "safe" word/phrase replacements etc. And don't forget about formulas, charts etc - pure text version is not too useful on its own

link

hdjjhhvvhga 1603 days ago

If you deal with fiction and the like where you basically have just text then I think that's correct: it would be trivial to detect the watermarks in various copies by simply comparing them. I was dealing with PDFs containing tables, formulas, illustrations, etc., so a plain-text version would be unusable.

link

snovv_crash 1603 days ago

Randomly choose 3 big paragraphs in the entire ebook to add an extra newline in the middle of at the end of a random sentence. This would be my choice if I had to do some kind of invisible watermarking, at least.

link

hdjjhhvvhga 1603 days ago

This is one of the many things that could be trivially detected and fixed when you have multiple watermarked copies of the same file.

link