Hacker News new | ask | show | jobs
by rhn_mk1 1546 days ago
> data is extractable

That's true for plain text (in the best case), but try extracting an equation, table or a diagram.

Stepping away from best case, PDFs in theory look the same everywhere, but turn into a mess on buggy implementations or differing rendering engines – due to the insistence on having a stable presentation, they assume positining and sizing always works, so when that fails, it fails worse than a buggy rendering of a presentation-agnostic document like an HTML page.

(In my experience, bugs either enter just before printing, or when displaying using JS-based renderers).

1 comments

> That's true for plain text (in the best case), but try extracting an equation, table or a diagram.

Good point. From what format are tables, diagrams, and formulas extractable (while retaining format)? I've had good luck moving tables between my web browser and email applications, though it always surprises me that the html is implemented similarly enough.

> PDFs in theory look the same everywhere, but turn into a mess on buggy implementations or differing rendering engines

I don't deal with PDFs programatically, and it sounds like you might, but from the user end, and from running networks of thousands of users, I've hardly ever seen problems in practice except for the browsers' JavaScript renderers.