Hacker News new | ask | show | jobs
by moonchild 693 days ago
they are talking about treating ocr as lossy. i wonder about making a lossless compression algorithm for text scans based on an ocr; in effect, use the ocr to predict which text will show up and how, and then encode the pixel-level differences on top of that
1 comments

DjVu does this to some extent, identifying identifical glyph bitmaps and reusing them for compression. See https://en.m.wikipedia.org/wiki/DjVu#Compression