Fears of watermarking is probably why the leaked documents are what they are. A court order and a training slide deck are the kind of thing that people are authorized to distribute internally.
Which is why you need a co-leaker. Dangerous yes, but you can at least compare documents between each other. Extract the text, strip the UTF down to ascii and fix the whitespace...
Hell, even have it transcribed by a typist. Full air-gap. This whole leaking business needs to be turned into an SEO optimized translated wiki page.
Hell, even have it transcribed by a typist. Full air-gap. This whole leaking business needs to be turned into an SEO optimized translated wiki page.