| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ken 2390 days ago

The Consortium chose the domain of the problem space, though. "Text" could be either much simpler or much more complex than Unicode chose to model it. They picked what features they wanted to support, and now we all have to live with that.

In my lifetime, I've seen text systems that choose to include style (bold, italic), color (fore and back), or size with each character. Unicode did not (generally) choose to include those, even though they're arguably part of "global text". Ligatures, too, are generally considered the domain of font rendering, not text storage. Vertical text was, too, until a few months ago. "Historical" writing directions are apparently still considered out of scope for Unicode. Linear A is in scope, though, even though nobody is sure what the characters mean.

Unicode did choose to be backwards compatible with ASCII, and include most of the crazy PETSCII characters (which were pretty popular for just a couple years but not really "global text"), and some mathematical symbols that no mathematician had a use for.

They chose to include both a nice component-combining system and also pre-combined glyphs where legacy codepages had used them. They chose to implement Han unification, but not analogous unifications across other scripts which have even more similar glyphs.

I've dug into the details of Unicode since 3.0 (20 years ago!), and found it's full of arbitrary decisions and legacy workarounds. The contributors are smart but the result looks the same as any committee full of people with conflicting goals.

Legacy support is why it's so well-adapted, and I've never seen a system where piling on legacy support made it "well-designed".

Suppose you wanted to make a system for "universal computation". The analogous method would have been to take Win16, Win32, Mac OS 9, Mac OS X, Linux, and Solaris, define the superset of all of their features, and invent a binary format which supported all of it natively. Legacy support might help get it adopted faster but nobody would call it well-designed. 20 years later, it'd clearly be simultaneously too weak and too powerful for all types of computation we want to do.

Unicode is an amazing political accomplishment. Technically, it seems rather mediocre. Nobody would ever design a text system like this unless held back by mountains of legacy 1970's/1980's systems.

1 comments

arrrg 2390 days ago

I think you fundamental mistake is thinking that you can separate technical and political concerns or that those two things really are always distinct and cleanly separable.

To me having that kind of discussion really doesn’t make a lot of sense. You know, since “we live in a society” (at the risk of quoting a clearly thought-terminating cliche).

Unicode contains within itself thousands of design-decisions, many of them trade-offs. After the fact it’s always extremely easy to swoop in and nitpick those trade-offs. No possible world exists where all those trade-offs are made correctly and what’s more defining what a “correct” trade-off even is is frequently simply impossible to know.

(Just one example to illustrate the scope of this problem: A certain trade-off might be worth making in one direction for use case A and in another direction for use case B, however it’s not really easy to find out whether use case A or use case B are more frequent in the wild. What if both use cases are about equally as important? Now imagine the trade-off space not being a binary space but multi-dimensional. Now imagine not just two but several use cases. Now imagine use case usage changing over time and new use cases emerging in the future.)