Hacker News new | ask | show | jobs
by Animats 1617 days ago
Sort of. Applications using UTF-16 have to be aware of pairs at the application level. Many are not.
1 comments

This isn't a consequence of using UTF-16 as such - Java, .NET etc could totally have an API around UTF-16 strings that handles surrogate pairs. The problem, rather, is that those languages introduced a 16-bit type that they called "character", even though it wasn't even a Unicode codepoint. And then used that type throughout all string APIs, including strings themselves (indexing etc).

In .NET land you're now supposed to use https://docs.microsoft.com/en-us/dotnet/api/system.text.rune instead. It transparently handles surrogate pairs, so the app needn't be aware of anything - and yet the internal encoding is still UTF-16.