| HN Mirror

It is essentially OCR where the alphabet is constructed on the fly from the document itself.

A major and highly pertinent difference is that if this OCR-ish procedure incorrectly classifies two identical letters as being different, accuracy is not affected, and the only consequence is a larger file. With normal OCR, seeing two As and saying they're different would be an error, but in this case, it's fine.

What this means is that, while regular OCR is inherently error-prone, this compression procedure can be fully tuned anywhere between no errors and nothing but errors, with file size being the tradeoff.

The ability to run this algorithm in a way that produces no errors may be enough to disqualify it as "OCR", depending on your point of view. In any case, it certainly changes things from "that's just how it is" to "this is a royal cock-up on Xerox's part".