Hacker News new | ask | show | jobs
by sortie 3775 days ago
char16_t and char32_t are useless. The C standard declares functions in <uchar.h> for converting them to and from char, but not wchar_t. The conversion to char may be lossy depending on the platform. No other interfaces uses those types. There's no portably lossless path converting them to and from wchar_t.
2 comments

There are various "solutions" to this problem of holding one "character" per instance of a type. If for some reason you don't want to use char * (for example you want to find the length of a multi-byte-per-character length string), there's https://github.com/cls/libutf
The point is that wchar_t should be removed entirely. Assume it doesn't exist anymore and use char16_t and char32_t everywhere.
This is entirely undesirable. First of all, char16_t and char32_t are kinda useless as there's no standard interfaces using them, and there's no conversion functions to and from wchar_t.

Secondly, no, you're asking for a massive addition of 2 new versions for every interface that mentions wchar_t. That's a huge addition to standard libraries. That's error prone and bloats things up. Then additionally you're asking for a rewrite of all software using wchar_t. And only until everything is transitioned, which isn't going to happen, the standard libraries will be much larger.

The solution is rather to embrace wchar_t and fix it. All sensible and modern platforms, which is a premise of this article on modern POSIX functions, have a 32-bit wchar_t type. That's excellent. It's only Windows, which due to historical short-sightedness that have 16-bit wchar_t. But writing portable C for native Windows is a losing game, the winning move is not to play. (Do see midipix which is upcoming and will provide a new POSIX environment for Windows with musl and 32-bit wchar_t). In fact, 16-bit wchar_t violates the C standard. That moment you give up broken platforms with 16-bit wchar_t, wchar_t works as intended, and this is a non-problem. Embracing char16_t and char32_t is a worse problem and isn't solving anything.