| Befuddling that this happened again. It’s not the first time - Paul Manafort court filing (U.S., 2019)
Manafort’s lawyers filed a PDF where the “redacted” parts were basically black highlighting/boxes over live text. Reporters could recover the hidden text (e.g., via copy/paste). - TSA “Standard Operating Procedures” manual (U.S., 2009)
A publicly posted TSA screening document used black rectangles that did not remove the underlying text; the concealed content could be extracted. This led to extensive discussion and an Inspector General review. - UK Ministry of Defence submarine security document (UK, 2011)
A MoD report had “redacted” sections that could be revealed by copying/pasting the “blacked out” text—because the text was still present, just visually obscured. - Apple v. Samsung ruling (U.S., 2011)
A federal judge’s opinion attempted to redact passages, but the content was still recoverable due to the way the PDF was formatted; copying text out revealed the “redacted” parts. - Associated Press + Facebook valuation estimate in court transcript (U.S., 2009)
The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly. A broader “history of failures” compilation (multiple orgs / years)
The PDF Association collected multiple incidents (including several above) and describes the common failure mode: black shapes drawn over text without deleting/sanitizing the underlying content.
https://pdfa.org/wp-content/uploads/2020/06/High-Security-PD... |
I've seen lawyers at major, high-priced law firms make this same mistake. Once it was a huge list of individuals names and bank account balances. Fortunately I was able to intervene just before the uploaded documents were made public.
Folks around here blame incompetence, but I say the frequency of this kind of cock-up is crystal clear telemetry telling you the software tools suck.
If the software is going to leverage the familiarity of using a blackout marker to give you a simple mechanism to redact text, it should honour that analogy and work the way any regular user would expect, by killing off the underlying text you're obscuring, and any other correponding, hidden bits. Or it should surface those hidden bits so you can see what could come back to bite you later. E.g. It wouldn't be hard to make the redact tool simultaneously act as a highlighter that temporarily turns proximate text in the OCR layer a vibrant yellow as you use it.