The PDFs this produces are simply collections of PNGs, and won't be accessible. It's always a compromise though. If you try to edit the PDF adding black boxes, and remove hidden objects, you may still leak data via the tagged pdf text; it doesn't have to match up to what's on the page exactly. So, converting to PNG isn't a terrible idea, but it would be nice to combine this with something that OCRd the PNG conversion? eg
(which uses tessaract under the hood). The other thing this is missing, comparing it to commercial redacters I've used, is the ability to assist in the redaction: eg removing SSNs, phone numbers, all occurrences of key phrases.
https://github.com/fritz-hh/OCRmyPDF
(which uses tessaract under the hood). The other thing this is missing, comparing it to commercial redacters I've used, is the ability to assist in the redaction: eg removing SSNs, phone numbers, all occurrences of key phrases.