Hacker News new | ask | show | jobs
by Someone 2472 days ago
https://frontiermyanmar.net/en/features/battle-of-the-fonts (mentioned in the article being discussed) is much clearer.

Short version: in Burmese, the form a character takes depends on context. Zawgyi ‘solves’ that by having separate code points for the different forms, requiring the user to pick the right variant. The Unicode way is to make the font and the (font + font renderer) pair smarter, just as Unicode renders “é” instead of the two code points “e’”.

Zawgyi also, necessarily, uses Unicode code points assigned for other characters to encode the variants.

1 comments

The shape of "lowercase sigma" depends on whether it's in the middle of a word or at the end. These are adjacent in address space.

ς and σ. I won't shout out their names. Is this the case in modern Greek too?

Many of such warts in Unicode are for allowing round-tripping with 8-bit character encodings. I suspect that’s the case here, too. https://en.wikipedia.org/wiki/ISO/IEC_8859-7 has them, too.

That doesn’t explain why Unicode seems to have 27 (!) different “sigma” code points, though (https://en.wikipedia.org/wiki/Sigma#Character_encoding)