Hacker News new | ask | show | jobs
by AbanoubRodolf 85 days ago
The raster image problem is real but there's a middle ground between "never invert" and a full NN classifier.

The author already computes BT.601 brightness per page. You can run the same calculation per-image bounding box instead of per-page, then add a bimodal pixel distribution check: if a raster image has most pixels near black or near white with few midtones, it's probably a line diagram or screenshot, not a photograph. That heuristic catches the main false-positive case (black-line diagrams on white backgrounds) with maybe 20 lines of image processing code.

It won't be perfect, and gwern's point stands that a proper trained classifier would be more accurate. But for a PDF viewer where you're already parsing content streams to get image coordinates, it's a lot cheaper than shipping a model and handles 80% of the problematic cases. The remaining edge cases (medical scans, thermal images) are rare enough that the per-page toggle is reasonable fallback.

2 comments

FWIW, we did consider a histogram heuristic, and I believe GreaterWrong still uses one rather than InvertOrNot.com. But I regularly saw images on GW where the heuristic got it wrong but ION got it right, so the accuracy gap was meaningful; and that's why we went for ION rather than port over the histogram heuristic.
Really appreciate this AbanoubRodolf, thank you. The brightness analysis code and the image bounds are both already in the project, I just never connected the two. The distance between where I am and where you're suggesting I go is really short. Feedback like this is exactly why I posted here. Thanks again