| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ajross 4040 days ago
	The Unixish C runtimes of the world uses a 4-byte wchar_t. I'm not aware of anything in "Linux" that actually stores or operates on 4-byte character strings. Obviously some software somewhere must, but the overwhelming majority of text processing on your linux box is done in UTF-8. That's not remotely comparable to the situation in Windows, where file names are stored on disk in a 16 bit not-quite-wide-character encoding, etc... And it's leaked into firmware. GPT partition names and UEFI variables are 16 bit despite never once being used to store anything but ASCII, etc... All that software is, broadly, incompatible and buggy (and of questionable security) when faced with new code points.