Hacker News new | ask | show | jobs
by prodigal_erik 4948 days ago
They're facets of the same problem. I shouldn't routinely be dealing with either surrogates or combining marks; unless I have a specific reason, it's only an opportunity to make a mistake that hardly anyone knows how to troubleshoot. "n̈" should be an indivisible string of length one until I need to ask how it would actually be encoded in UTF-16 or whatever.
1 comments

But that's the point - there is no such character. Given the Unicode consortium have added codepoints for every other bloody thing under the sun, I'm amazed that there isn't one for n-diaresis but there you are.

Add a small number of people who for artistic reasons decide that they want to make life hard (Rinôçérôse I'm looking at you) and you just have to accept that the length of your string might not equal the number of codepoints contained therein...