Hacker News new | ask | show | jobs
by alexknvl 3326 days ago
https://archive.fo/VCz3I/22b9ae6e1cfad9328490576e8ddb4549f87...

Now, I have no idea which one is the original, but AFAIU once you take a jpg and scan it into pdf, it's easy to move the text around (?).

EDIT: I hope my implication is clear. I think that the jpg file is more likely to be the original photo. It was later scanned as a pdf. Then OCR'd version was edited a bit by merging parts of the text (or only parts of it were OCR'd). Then a journalist opened it up in an editor and saw those layers.

Going in the opposite direction would require someone to produce an authentic looking jpg from an obviously OCR'd pdf. That seems like less likely to me.

1 comments

When software OCRs a PDF, it does it by adding an invisible text layer aligned with the original text, while leaving the original text visible. This makes the PDF searchable, without having to worry about changing the font, introducing OCR errors where people can see them, or disturbing the background. What we see here is very unambiguously the result of a PDF-editing program, not a scan+OCR.