| HN Mirror

Hm. Are you sure? Because the utf8everywhere article (and various microsoft related framework discussions) seem to suggest there's no validation anywhere. You can easily create partial codepoints and just hitting backspace in a text field can do it. That seems to imply there's no UTF-16 validation even at a higher level.

But I will readily defer to your expertise on this. I've not coded in microsoft land for like 18 years. MFC was my last experience in this, where I still have this vague memory of being shocked by an API returning an int32 and instructing on casting to a void pointer (overloaded response message). No wonder they had issues with 64 bit migration at the time.

Edit: cite on the utf8everywhere thing. "in plain Windows edit control (until Vista), it takes two backspaces to delete a character which takes 4 bytes in UTF-16. On Windows 7, the console displays such characters as two invalid characters, regardless of the font being used."

Maybe they've improved since though. But surely there's a lot of that baggage in the libraries.