| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by int_19h 1617 days ago
	This isn't a consequence of using UTF-16 as such - Java, .NET etc could totally have an API around UTF-16 strings that handles surrogate pairs. The problem, rather, is that those languages introduced a 16-bit type that they called "character", even though it wasn't even a Unicode codepoint. And then used that type throughout all string APIs, including strings themselves (indexing etc). In .NET land you're now supposed to use https://docs.microsoft.com/en-us/dotnet/api/system.text.rune instead. It transparently handles surrogate pairs, so the app needn't be aware of anything - and yet the internal encoding is still UTF-16.