| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Araq 4775 days ago

Ok, lets see: UTF-16 strings encourage bugs with surrogates. Note that most C# and Java code is notoriously broken wrt those and yet I never hear anybody complain about it.

UTF-32 strings roughly take up 4x more memory than UTF-8 strings and yet hardly solve anything: The proper toUpper("ß") used to be "SS" in German (nowadays there is an upper cased 'ß'). Other languages have other rules; i18n is hard, get used to it.

IMO Nimrod's UTF-8 strings at least make the programming errors easier to spot instead of the "mostly working but fails for edge cases" style that UTF-16 or UTF-32 encourage.