Hacker News new | ask | show | jobs
by jakobegger 2924 days ago
Yes, and "certificates" sounds like "certiticates".

Reminds me of a story about a copying machine that had a image compression algorithm for scans which changed some numbers on the scanned page to make the compressed image smaller. (Can't remember where I read about that, must have been a couple years ago on HN)

2 comments

It's the lossy jbig2 compression in Xerox copiers: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...

And yes, I think this is a relevant comparison. As the entropy model becomes more sophisticated, errors are more likely to be plausible texts with different meaning, and less likely to be degraded in ways that human processing can intuitively detect and compensate for.

> t's the lossy jbig2 compression in Xerox copiers: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_....

My understanding of this fault was that it was a bug in their implementation of JBIG2, not the actual compression? Linked article seems to support this.

I think it was just overly aggressive settings of compression parameters. I don't see any evidence that the jbig2 compressor was implemented incorrectly. Source: [1]

[1]: https://www.xerox.com/assets/pdf/ScanningQAincludingAppendix...

Right. Jbig2 supports lossless compression. I'm not very familiar with the bug, but it could have been a setting somewhere in the scanner/copier that it was changed to lossy compression instead. Or they had lossy compression on by default or misconfigured some other way (probably a bad idea for text documents).
The bad thing was that it used lossy compression when copying. That was the problem.
No. The bug was when using the "Scan to PDF" function. It happened on all quality settings. Copying (scanning+printing in one step, no PDF) was not effected.
No compression system in the world forces you to share parts of the image that shouldn't be shared. So that's true in a vacuous sense.

But the nature of the algorithm means that you have this danger by default. So it's fair to put some blame there.

This is a big rabbit hole of issues I'd never even considered before. Should we be striving to hide our mistakes by making our best guess, or make a guess, that if wrong, is easy to detect?
The algorithm detected similar patterns and replaced these with references. This lead to characters being changed into similar looking characters that also appeared on the page.
Xerox copier flaw changes numbers in scanned docs: https://www.theregister.co.uk/2013/08/06/xerox_copier_flaw_m...