| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sortie 3823 days ago

No, it's important to understand the distinction between char and wchar_t. Both are relevant, but in different contexts. char should be considered a byte type to pass around UTF-8 with. This is the appropriate level for the large majority of common string operations, such as concatenation, outputting strings directly, parsers that only handle ascii characters specially, and so on.

Those applications don't really care about the actual unicode codepoints besides ASCII. If you start to deal with visual representation of strings, calculating the column for error messages, advanced unicode-aware parsing, font rendering, and so on, then you do want to convert on the fly to wchar_t. mbsrtowcs and such are kinda bad, because they convert the whole string at once, which means an allocation that can fail in the unbounded case. It's usually sufficient to decode one wchar_t at a time with mbrtowc.

This way, char and wchar_t are not replacements for each other, but complement each other by being better abstractions for various purposes. Now, the wide stdio functions is where things start to get a bit useless, because the regular stdio char functions are perfectly fine and those functions don't really appeal well to the strengths of wchar_t.