Hacker News new | ask | show | jobs
by kps 2388 days ago
‘Remove’ is too strong, since Unicode is entrenched. But there are things that should have been done differently. For instance, combining characters and operators should have been placed before the base character rather than after, so that (a) it would be possible to know when you've reached the end of a character^W glyph^W grapheme cluster without reading ahead, and (b) dead keys would be identical to the corresponding characters.

> façade and résumé

ASCII (1967) allowed for them: c BS , or , BS c ↦ ç and e BS ' or ' BS e ↦ é. Encoding ç as 63 CC A7 is not manifestly better than encoding it as 63 08 2C.

1 comments

> ASCII (1967) allowed for them: c BS , or , BS c ↦ ç and e BS ' or ' BS e ↦ é. Encoding ç as 63 CC A7 is not manifestly better than encoding it as 63 08 2C.

Doesn't work for ñ, since the ASCII ~ is often typeset in the middle of the box instead of in a position to appear above an 'n' character. " is a pretty poor substitute for ◌̈ though, especially when you're trying to write ï as in naïve. And then there's the æ of archæology, which doesn't work with overwriting.

I'll also point out that ç is U+00E7 in Unicode and C3 A7 in UTF-8, not 63 CC A6, since it's a precomposed character (and NFC form is usually understood to be the preferred way to normalize Unicode unless there's a reason to do something else).

Tilde exists in ASCII because of its use as an accent. (In 1967 the non-diacritic interpretation was an overline.) The use in programming languages, and lowering to fit other mathematical operators, came later.

There was never any requirement that ‘n BS ~’ have the same appearance as ‘n’ overprinted with ‘~’, although terminals capable of making the distinction didn't appear until the 70s.

Precomposed characters aren't relevant to illustrating composition mechanisms.