Hacker News new | ask | show | jobs
by needcaffeine 2745 days ago
What is the right way to redact a pdf?
4 comments

Adobe Acrobat has a redaction tool that will add the black box AND remove the underlying text from the document. It is not mere black highlighting.
The only true safe way is to print it, redact with a sharpie and scan it back in. Not everyone is versed enogh with Adobe acrobat to do it the right way.
This will infect the printed (and scanned) document with patterns that allow the government and/or police to identify your printer.
1) Irrelevant in the case of this court filing as the lawyer's name is attached

2) In the case where this might be relevant there's a much easier way - export PDF pages to high resolution PNG, edit to put a black box over the text, and then combine the PNGs into a PDF again. This also has the benefit of stripping any PDF metadata that may have been present in the original.

FWIW, (2) doesn't strip any sort of steganographic watermark that might already be embedded in the document, but invisible to the naked eye. Of course, there's no reliable way to do that, anyway. And steganography could be used in ways that are visible to the human eye (i.e., survives printing/scanning) but still imperceptible with a single copy of the document.
You can strip the steganographic watermark by having somebody retype the information from the document, but that doesn't defeat the barium meal.
Not necessarily! Consider rearranged words or punctuation in different versions of the document. You'd have to do something like type up a paraphrase to get around that.
Sobel filter afterwards.
Right, because people who aren't versed enough to use proper digital redaction tools, know how to apply a Sobel filter?
Redacting with sharpie migh still leave the text visible via gamma futzing or other photoshop tricks. Print-exacto knife-scan is an improvement on print-sharpie-scan.
Export the page in question to a raster image format like PNG at a high resolution, use an erasing tool on it, re-import the whole thing as a high-res raster (not OCR).

In the past 15 years I have seen so many poorly redacted PDF from government agencies, where they just draw a black box over the PDF-native text.

And then ORC afterwards for acceasability. Muller has not messed this up yet sadly.
Not that it's hard to guess who "Individual-1" is.
Delete the text you're redacting?