Hacker News new | ask | show | jobs
by flatfinger 2257 days ago
It's too bad Unicode wasn't designed around the concept of easily-recognizable grapheme clusters and "write-only" [non-round-trip] forms that are normalized in various ways. A text layout engine shouldn't have to have detailed knowledge of rules that are constantly subject to change, but if there were a standard representation for a Unicode string where all grapheme clusters are marked and everything is listed in left-to-right order, and an OS function was available to convert a Unicode string into such a form, a text-layout using that OS routine would be able to accommodate future additions to the character set and and glyph-joining rules without having to know anything about them.
1 comments

You can't do that without commiting to not supporting pathological text, otherwise you're stuck adding new special cases to the layout engine every update anyway.

I do have some ideas for a better encoding (like, I assume, anyone competent with sufficient free time and interest in text encoding), but there's a lot of reluctance to put effort into something that's already completely eclipsed by a technically inferior but not completely unusable alternative, so I've had it mostly shelved.