Hacker News new | ask | show | jobs
by otterley 4480 days ago
For archival purposes, I strongly recommend saving extra versions of documents in PDF format. Those should be readable forever.
5 comments

Pages '13 can't save PDFs with links. The official workaround? Use Pages '09.

Pages '13 can't open templates created with Pages '09.

Pages '13 can't copy and paste lists with numbers into plain text.

There's a reason why it has 2 stars on the app store. Somebody seriously dropped the ball at Apple. Office never pulled shit like this. As soon as '14 is out for mac, I'm removing iWork. Caring about design is one thing, caring about design more than the existing work of your users is another.

Text formats should be readable forever. PDF, while generally excellent in this regard, has already broken backwards-comparability on several occasions (or rather: Acrobat has, which is not formally the same as the format doing so, but in practice there isn’t a whole lot of difference; you might as well just read the raw xml in an “unopenable” keynote document).

Adobe’s marketing will tell you otherwise, of course. I used to share an office at Berkeley with Paulo Ney de Souza, who had a wonderful collection of “legacy” pdf files that could no longer be opened in Acrobat that he would trot out for the Adobe sales people when they came by (he was helping to get MSP off the ground at that point).

PDF is probably the best choice for preserving “design”, but I wouldn’t trust it for preserving content any more than any other format. Always keep a plain text copy.

> PDF is probably the best choice for preserving “design”, but I wouldn’t trust it for preserving content any more than any other format. Always keep a plain text copy.

I agree, but have a look at PDF/A (A is for Archiv{e|al}): http://en.wikipedia.org/wiki/PDF/A

> PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for the digital preservation of electronic documents.

> PDF/A differs from PDF by omitting features ill-suited to long-term archiving, such as font linking (as opposed to font embedding).

PDF is an open spec since a few years.

http://www.adobe.com/devnet/pdf/pdf_reference.html

Open spec doesn't help unless all the creation tools adhere strictly to the specification. Historically, they haven't, and support for their various "quirks" has been uneven at best.
Officially, OOXML (aka .docx and friends) is an open spec, too.
And that's a good thing. OOXML is a terrible format, but it's still an open spec. Or do you want .doc back?
.docx (and co.) isn't really an open spec. Microsoft forced it through the standardization process, but it doesn't really deserve the title.

The "spec" is full of statements like "render this the way that it was done in Office 95".

The best solution would be to use ODF, which is supported by all office software.... except Apple's.

Yea, I mentioned it in my wishlist for Satya: http://hal2020.com/2014/03/03/satya-shuffles-his-leadership/...
I agree, the same argument could probably have been made of postscript (ps) at some point in time and while it's still around, most (non-technical) people don't use it.
I was recently asked for my resume, a document which I created in Pages and then exported to PDF. So sure, what I had was readable, but it was out of date. Thus began the painful cycle of getting Pages (09) back on my machine - for some reason I had deleted it in the interim. Boy was that a mistake. I'm just lucky I still had a disk image of the old iWork - without that I would have been hosed.
I'd rather bet on something you can implement all by yourself without needing to wade through a thousand page spec. How about plain uncompressed text and netp[bgp]m for images?
There are FOSS implementations of PDF. Mozilla has one, I think.

Edit: yep.

http://mozilla.github.io/pdf.js/

Yeah, that's the one that always bitched and moaned about not being able to display the document correctly. I got tired of it and disabled the thing.

I've used other PDF viewers (Evince, xpdf, gs, mupdf), and tell you what.. I've come across PDFs they cannot display properly.

If I have to rely on others implementing things for me, and there is concrete evidence that others have trouble doing it, why would I rely on such a format?

As they say: "patches welcome".

Do you also implement your own OS from scratch, or do you rely on others?

Every FOSS OS I know of has patches coming out on near-daily basis.

"Patches welcome" is an aggressive, user-hostile, anti-social response to being told that the thing you suggested does not work. It's telling the user to fuck off because your own suggestion was flawed. clarry didn't run to HN and scream "pdf.js sucks!".
No, it isn't.

It's the whole point of FOSS. If it doesn't work for you, fix it.

This criticism is especially off-base given that the OP said he wanted something he could control. Use the source, Luke!

I think you're reading too much into that, or have a chip on your shoulder, or both. "Patches welcome" is an invitation, a smiling, friendly, we-think-you're-good-enough-and-want-your-help, open-handed gesture that is meant to encourage cooperation and evoke the deeply human drive to help others.
Do you also implement your own OS from scratch

No, not today. Maybe in the future, who knows.

But I try not to depend on too many things and people, if there's a way around it. I like to have control. That's freedom to me, and freedom gives me peace of mind.

So, no, I don't see why I should waste my precious free time improving software support for a format that I find way overcomplicated and just plain silly. Why should I?

No one is telling you what to do.

If plain text does the job for you, great. If not, and you come up with something better than PDF, also great. I'll be happy to use it.

Complaining is always a lot easier than doing.

Hmm, I've almost never had issues with PDFjs, and the handful of times it's been a problem Evince has worked fine. It is the default PDF viewer in Firefox now so I assume it works well enough for them to do that. I think the PDFs that can't be opened correctly may be blamed on the creation tool screwing up rather than the reader. Though I guess I don't know if those handful of PDFs were up to spec or not.
This bugs me a lot. If the spec is known how it is possible for some random reader to not being able to display it. Is pdf now somewhat html in late 90's where it was browser specified rather than specification?
There's no formatting info in those. Why not just include a statically linked copy of xpdf? The x86 family will be around for a while, and if it goes away I'm sure there will be emulators.
What do you do when xpdf doesn't display the pdf correctly?

What system does that statically linked thing run on? Is that system going to remain compatible for decades? Or emulators capable of running an old version of it?

Why overcomplicate matters when you can pick something you know just works.

Because I care about formatting? Anyway, documents I haven't touched decades and decades from now, I probably won't care about.

I'm just not especially concerned. It's not like I get better life out of my paper documents, as I don't worry about the whole acid-free paper in an hermetic container deal.

I save my archival stuff in .HTML

That's worked out fairly well so far. Stuff I wrote back in the 90s is still readable. Probably not that great if you're a designer, who needs something to look just so when printed. But for near enough everything I do, it's more than sufficient.

Easy to convert too, since it's just tagged text.

This is the most sensible thing I've heard for years.