Hacker News new | ask | show | jobs
by matrixagent 2006 days ago
Read it again. This bug could hit you without ever using ABBYY yourself. Apple broke Preview.
4 comments

I read it. It doesn’t show that Apple broke Preview. It says Preview stopped being compatible with PDF’s produced by ABBYY.

Those PDFs could have been buggy all along, and only now be showing up due to improvements in preview.

It’s possible that Apple broke preview, but having seen how poorly maintained ABBYY is, I wouldn’t be surprised if it was producing malformed PDFs that just happened to work on older version of preview.

I honestly don't care who is exactly at fault. I definitely blame Apple for destroying the PDF. There might be plenty of blame left for ABBYY for creating a bad PDF in the first place. That does not change how utterly unacceptable Preview.app's behavior here is. If you honestly think there is no valid criticism for Apple here, I don't know what to tell you. In any case, the whole thing is annoying technology – and seeing some of the comments here, people should more often try to step out of their own mind bubble and try to look at these things form their grandparents' or similar perspective. I personally am very aware of plenty of ways to work around this issue. But Apple used to be the company you could use if you either are not or don't want to be concerned about things like that.

And a quick edit: I really dislike people who discuss like you do – all of what I just wrote is already stated in the post itself. Another commenter even called you out for ignoring that part when quoting me. I should not have wasted five minutes spelling it out for you again, your mind won't be changed anyway.

With all due respect, if you had led with your “footnote”, you wouldn’t have even been able to write such an angry sounding piece. Also, you say that you don’t care who is at fault, but that isn’t how the piece came off. It’s odd you’d say that now.

I quoted you because you hadn’t incorporated this key information into the main text.

What I think is that there are many ill-formed PDFs out there and that supporting the intersection of all of them is essentially impossible.

I also think that it’s every bit the responsibility of a company like ABBY to generate good PDFs. How can it not be? Relying on Preview to be forgiving when you are a maker of PDF generation software is obviously irresponsible.

Why do you think ABBY didn’t announce that this problem existed when the Big Sur betas were available for them to test with?

For what it’s worth, I stopped recommending Fujitsu scanners years ago. For a while I loved mine, but none of the software was well maintained.

"Broke" may be a bit harsh. What appears to be happening is that Preview somehow loses or corrupts the toUnicode map, which is apparently located in the metadata, when saving the PDF. Mind that every application will have to reassemble/reflow the metadata when saving a reflowed document (like after cropping and/or discarding pages). To do so, the application has to interpret and to reassemble the metadata before wiriting it back.

Now, some algorithms and routines may be more robust and allowing than others. Maybe, an innocent refactoring attempt just lost that critical bit of robustness, required to deal with that particular format produced by this particular application.

For example, consider an XML-based format, where a particular application delivered a malformed document, like a missing closing quote for some attribute. Most XML interpreters will churn happily along with this, but, after a rewrite of some routine, an application just ignores the malformed tag with the runaway string. Did it break XML? Or did it just fail to interpret a malformed document, it had somehow been able to deal with thanks to some extra robustness present in the previous version?

Considering this hypothetical case: Should that application be improved by an update to regain its previous robustness? Yes, absolutely. Is it a bug and is the vendor to blame? Probably not. Mind that this might be quite well what is happening here, as well.

I think you might interpret the word more harshly than I intend it to be. Something that worked before Big Sur is not working in Big Sur. It's broken. That's all there is to it for me as an end user. Of course there is a difference in reasons for things breaking, and I'm not denying that Apple might have "improved" Preview.app when you judge it by how well it adheres to a PDF spec. But it still "broke" a very common and normal workflow by doing so, and I would at least expect them to acknowledge that. If they decide breaking this is worth it, that's absolutely fine with me. But I don't think they are even aware of it, and that is not something I'm willing accept from a company with the reach and resources of Apple.
Are you happy to accept corrupt PDFs being generated by ABBYY, a company that does nothing but write software to produce PDFs and yet hasn’t even commented, let alone maintained their software?

Do you think ABBYY is even aware if the issue?

It seems like you expect Apple to test everyone else’s software, and make workarounds for their bugs.

Regardless of Apple’s scale and resources, this is obivously unreasonable.

Also you say: “That's all there is to it for me as an end user” as if you didn’t understand anything about the complexities of software development, but reading your comments on other topics, you are obviously skilled in the art. You are not ‘just an end user’ who doesn’t understand the complexities.

So is there an active Preview corruption example that doesn't involve ABBYY? I've used FineReader before for a commercial effort, I do remember it being very finicky.
It's unrelated to OCR, but there have been other Preview issues. We ran into an issue a few years ago where saving a PDF with forms in Preview would set some style setting so that in most other readers, the form fields would have no background color and use white text, making them unreadable. They were still perfectly readable in Preview, but had the style issues in Chrome, Firefox, Acrobat (and Reader), and Foxit. We have a project where we programatically fill in PDF forms using PDFtk and one day, our editor just starting spitting out empty PDFs. After troubleshooting, we traced it back to the style changes Preview was making after another dev had accidentally done a CMD+S on the template file and committed the template.

In short: NEVER save a PDF with Preview. You should probably just avoid opening it in Preview period, frankly.

I personally don't use anything else, but when the problem first occured a few years ago, it was not limited to PDFs from ABBYY. (Which is not to say that it's purely Preview.app's fault. Maybe all of these PDFs were created in a bad way, would not surprise me at all. Could very well be that Preview.app is actually "improving" and fixing old bugs/cruft, breaking things that worked before but never should have in the first place. As the end user that doesn't really matter for me though, as I said in the post itself.)
> Could very well be that Preview.app is actually "improving" and fixing old bugs/cruft, breaking things that worked before but never should have in the first place.

Exactly this.

So the question is, whose responsibility is it? Apple’s to magically support the intersection of all the broken sofware?

You essentially have argued that ABBY is popular enough that Apple should have tested it.

Maybe it is popular, but the implication is that Apple would need to regression test against all this popular PDF generating software for any change to the preview engine, since they wouldn’t be able to know for sure what software’s PDFs would be broken by conforming changes.

What we know they did, was to make a copy of Big Sur available for ABBY to use to test their own software. That’s pretty standard practice in a case like this and is them behaving responsibly.

If Preview really was at fault, ABBY could have raised the issue with Apple, and or put a warning in their own software. If it’s that popular, you’d think they would have an incentive to do this.

What isn’t obvious is that Apple should somehow introduce workarounds every time a third party doesn’t fix a bug.

but the file was stuff generated with abbyy, even if you give it to someone else.