|
|
|
|
|
by ChrisSD
1612 days ago
|
|
No, it's UTF-16 with no validation at the kernel level. Invalid UTF-16 is also invalid UCS-2 as those code points were explicitly barred from use. In practice, only malware will create such broken names. High level software (e.g. Microsoft's own VSCode) will not handle broken UTF-16. And indeed the in-built UTF-8 code page will lossily decode UTF-16 (unpaired surrogates are replaced with the Unicode replacement character). |
|
But I will readily defer to your expertise on this. I've not coded in microsoft land for like 18 years. MFC was my last experience in this, where I still have this vague memory of being shocked by an API returning an int32 and instructing on casting to a void pointer (overloaded response message). No wonder they had issues with 64 bit migration at the time.
Edit: cite on the utf8everywhere thing. "in plain Windows edit control (until Vista), it takes two backspaces to delete a character which takes 4 bytes in UTF-16. On Windows 7, the console displays such characters as two invalid characters, regardless of the font being used."
Maybe they've improved since though. But surely there's a lot of that baggage in the libraries.