Hacker News new | ask | show | jobs
by faxmeyourcode 699 days ago
So my username is a little less ridiculous than I originally thought? :)

The fact that this can introduce OCR bugs into your C code is hilarious, and this is diabolical:

    #define one ( 4 - 3 )
    #define eleven ( 3 + 4 + 4 )

Source code is here https://github.com/lexbailey/compilerfax
4 comments

> OCR bugs

Especially if your fax machine uses JBIG2 compression. See: https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i...

I think it's appropriate linking directly to Kriesel's blog¹ or his talk, as that's about the scanner creating fake data and not about rce. Though technically it too is not an OCR bug as there's no ocr in JBIG2.

¹: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...

I wonder if OCR could be improved by adding a "language model" of sorts...

Like, sure, maybe it's hard to tell apart a "1", "i", or "l" purely visually, but if you knew it was supposed to be code, I'd suspect one could significantly improve the recognition accuracy if the system just worked in the probability of each confusable option given the preceding (and following) text.

This would also have a higher risk of introducing some nasty, hard to spot errors.

It's actually better for the compilation to fail than for the Clippy to make up something syntactically and compilation correct, but wrong.

You might be right in a practical sense, but for an art project like this, maybe not?
Need a proper preprocessor to take a code file and make it OCR-safe by substituting for dangerously glyphs.
This might be a good reason to support trigraphs again! https://en.wikipedia.org/wiki/Digraphs_and_trigraphs_(progra...

edit: fixed link, copy paste fail dropped the ++

Amateur! Use a barcode font!
monospace font OCR-B