|
|
|
|
|
by makeitdouble
179 days ago
|
|
Purely anecdotal, but I hoard a lot of personal documents (shopping receipts, confirmation emails, scans etc.) and for stuff I saved only 10 years ago, the toughest to reopen are the pure text files. You rightly mention Unicode, as before that there was a jungle of formats. I have some in UTF-16, some in SJIS, a ton in EUC, other were already utf-8, many don't have a BOM. I could try each encoding and see what works for each of the files (except on mobile...it's just a PITA to deal with that on mobile). But in comparison there's a set of file I never had issues opening now and then: PDFs and jpegs. All the files that my scanner produced are still readable absolutely everywhere. Even with slight bitrot they're readable, and with the current OCR processes I could probably put it all back in text if ever needed. If I had to archive more stuff now and can afford the space, I'd go for an image format without hesitation. PS: I'm surprised you don't mention the Unicode character limitations for minority languages or academic use. There will still be characters that either can't be represented, or don't have an exact 1 to 1 match between the code point and the representation. |
|
I've worked with lots of minority languages in academic situations, but I've never run into anything that couldn't be encoded in Unicode. There's a procedure for adding characters (or blocks of characters) for characters or character sets that aren't already included. There are fewer and fewer of those. The main requirement is documentation.