Hacker News new | ask | show | jobs
by simoneamico 81 days ago
Fixed. Full details in the commit here: https://github.com/simoneamico-ux-dev/veil/commit/9d09d9c

In short, by checking 3 simple signals veil can now distinguish a scan with overlaid OCR text from a native PDF with images. The first is whether the image covers more than 40% of the page, meaning it dominates the surface. The second is whether there are more than 200 characters, enough to be a document and not just a cover. The third is whether the image is predominantly blank paper rather than a photograph, verified by sampling the luminance of the pixels. When all three conditions are true, the image is no longer protected and the inversion applies normally. The same detection runs in the export path too. Thanks again for the file, importjelly.