Hacker News new | ask | show | jobs
by atommclain 765 days ago
I have a vague memory of someone being able to fairly accurately estimate redacted words and phrases in a government document by using the size of the blacked out portion along with the font metrics. I think the safest way to redact text would be to first normalize it all to the same text (maybe something like "etaoin shrdlu" from the hot type era), then black it all out, then there would be even less information leaked.
3 comments

This is especially true when the list of potential words is known. So if you know the court case is about Mark, Sandeep, and Elizabeth, and the names are redacted with boxes, then it's trivial to unredact each name by just looking at the length of the boxes.
You are correct. Font character-width analysis + NLP = Exposure
Got it, use fixed-width fonts for everything.
That would make it even easier to determine the length of the redacted word.