|
|
|
|
|
by ninkendo
1638 days ago
|
|
Eh, fair enough. While you’re correct, character sets that are “ascii, but something custom when the high bit is 1” are all just “ascii” to me, in that they are all mutually incompatible for anything other than the first 127 characters, and 8-bit encoding in general has been ubiquitous for nearly as long as ascii has been defined. (Meaning that when most people say “ascii”, they’re actually referring to one of those encodings in practice.) Asiatic character sets are an interesting point though. I wonder how common they were at the time of what Linus wrote… |
|
Don't call them just "ASCII"--that only serves to confuse people. Call them 8-bit ASCII-compatible charsets if you need a collective noun, but note that they are very different.
> (Meaning that when most people say “ascii”, they’re actually referring to one of those encodings in practice.)
Having actually worked on charset handling, when most people say "ASCII", they mean "ASCII" and not anything else. If a document is labeled as ASCII, then generally it should be handled as Windows-1252. If a conversion function claims to convert ASCII to something else, and doesn't provide any error mechanism (which it really should), then it usually means ISO-8859-1 aka Latin-1 aka map each byte to the first 256 Unicode characters.
But I'd never see, e.g., a KOI8-R document referred to as ASCII, nor anything that claimed to be ASCII assumed to be a KOI8-R document.
> Asiatic character sets are an interesting point though. I wonder how common they were at the time of what Linus wrote…
https://4.bp.blogspot.com/-O4jXmTm7WWI/Tyw1As8jt7I/AAAAAAAAI...
At the time he wrote that, the main Asiatic charsets for Chinese and Japanese would have been more common than UTF-8. Maybe Korean as well, although Linus's message is around the time that UTF-8 overtook EUC-KR. In any case, anyone who knew anything about character sets at the time would have been well aware of Asiatic variable-width character sets.