One key to know is that encoding (UTF-8, UTF-16, UTF-32) is a completely separate problem from rendering text. I have had a couple people say to me recently something along the lines of, "We don't need text shaping since UTF-8 takes care of it." That isn't remotely true. An encoding gets you a series of Unicode code points. To render this, these code points must get the bidirectional algorithm applies (bidi) and then these "runs" from the bidi algoritm are then shaped. The text shaper uses OpenType tables within the font to convert these code points into a series of glyph indices with x/y offsets. The renderer then works entirely on glyphs, which might not even map back to a code point in the font.
Unfortunately I don't. I started to learn Unicode, then realized how complicated it is to do right and stopped because I realized that nobody really cares if it works almost all the time.
As Joel below demonstrates, you can get away with 29 languages by treating code points as characters and without knowing about grapheme clusters and other stuff.
>When CityDesk publishes the web page, it converts it to UTF-8 encoding, which has been well supported by web browsers for many years. That’s the way all 29 language versions of Joel on Software are encoded and I have not yet heard a single person who has had any trouble viewing them.
Not really relevant. That just demonstrates that displaying those languages works adequately; it doesn't show anything about other processing that your software might care about (e.g. sorting, searching, case conversion, keyboard input, selection and editing, etc.)
> As Joel below demonstrates, you can get away with 29 languages by treating code points as characters and without knowing about grapheme clusters and other stuff.
If you treat text as completely opaque it does work fine. Issues crop up when you want or need to manipulate said text, either to extract information or to modify it.
The HarfBuzz manual touches on some of this: https://harfbuzz.github.io/why-do-i-need-a-shaping-engine.ht...