| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by account42 1604 days ago
	Having your strings be conceptually made up of UTF-8 code units makes them no less strings than those made up of Unicode code points. As this article shows, working with code points is often not the right abstraction anyway and you need to up all the way to grapheme clusters to have anything close to what someone would intuitively call a character. Calling a code point a character is not more correct or useful than calling a code unit a char. All you gain by having Unicode code point strings is the illusion of Unicode support until you test anything that uses combining characters or variant selectors. In essence, languages opting for such strings are making the same mistake at Windows/Java/etc. did when adopting UTF-16.