Hacker News new | ask | show | jobs
by eatmygodetia 1572 days ago
I feel like a lot of use plain text proponents forget that outside of ASCII and now UTF-8, lots of alleged plain text documents with diacritics or non-latin characters are at least slightly difficult to open because of their somewhat esoteric encodings. Plain text isn't as universal as it is often claimed, although it is immensely simpler than some other formats.

But maybe we should all use monochrome bitmap files for everything? That would be very simple.

4 comments

Yes, I feel this in my bones as someone who previously worked for a text messaging provider. Plain text has the deceptive appearance of simplicity, but it is actually one of the most maddening things to get right, especially if you intend to support the accurate transmission of said text to any possible text message receiving device in the world.
If it's 2022 and someone is _still_ saving plaintext in a non-Unicode encoding where going with Unicode is a perfectly viable option, I will personally ensure that (figuratively) they are burnt at the stake.

In addition to UTF-8, my language happens to have ~2 additional code pages/Latin based encodings. Some websites still serve (or very recently used to serve) text files in such broken encodings, so I have to convert such files before use. It's deeply unpleasant. Windows has supported UTF-8 in some fashion for over 15 years, get with the program people.

(I would make an exception for preserving historical non-UTF-8 files in their original byte-exact form, for the same reason that I wouldn't digitise an analogue photograph and then burn the original - but let's be real, all such files have been created by now)

That is why I tend to always keep files in plain ASCII, even though two out of my three primary languages need characters not in ASCII.

File longevity wins over grammatical correctness most of the time for me. I have text files going back to the 80s, so I'm glad I didn't use any fancier software to write them as they'd be completely unreadable today.

I think for a plaintext format to be "complete", it needs some mechanism of associating the language with some segment of text. Plaintext formats that don't acknowledge unified characters are just Latin-biased.
that's basically point - you can open an ascii file now because utf-8 is ascii oriented, but a utf-8 first editor will struggle with an old french text file for example. plain text has inbuilt biases which have changed over time, it's not as pure as simple as people say.
That's something I've always thought; plain text is pure and wonderful for me, an Anglo-writing American, because most of these formats were written for people like me.

I suspect for nearly every other language (or at least any language that doesn't use the ~100 characters/symbols used in the English alphabet), old ASCII text isn't terribly useful.