Hacker News new | ask | show | jobs
by Tronic2 2264 days ago
char effectively behaves as a signed type, making it unsuitable for binary operations (e.g. UTF-8 manipulation). I/O functions deal with char pointers, so using unsigned type like uint8_t requires casting back and forth. Is there any way out of this problem, and am I already breaking the aliasing rules with that cast?
3 comments

Plain char is either signed (same representation as signed char) or unsigned (same representation as unsigned char), depending on the implementation.

Yes, there are real-world implementations where plain char is unsigned.

Casting between the three character types is safe and doesn't violate aliasing rules. In addition, objects of all types can be accessed by lvalues of any of the three character types (though unsigned char is recommended), so there's no problem there either.

I/O functions that take a plain char* are designed to interoperate with char arrays and strings, so passing in unsigned or signed char is a sign that they aren't being used as intended. (Functions that traffic in binary data like fread/fwrite should take void*).

There are no aliasing differences between uint8_t and char as far as I know.
In practice not. In theory, it’s implementation-defined whether yhere are differences.
At least from what I've heard that's because stdint values are optional.

6.2.5p17 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char. 48)

and

5.2.4.2.1 says that width of char, signed char and unsigned char are the same (8).

I don't think it's anything to do with uint8_t being optional. It's because a char might have more than 8 bits.
A conforming implementation could extend the language with an 8-bit type __nonaliasingbyte which has no special aliasing privileges, and define uint8_t as being synonymous with that type.

On the other hand, the Standard should never have given character types special aliasing rules to begin with. Such rules would have been unnecessary if the Standard had noted that an access to an lvalue which is freshly visibly derived from another is an access to the lvalue from which it is derived. The question of whether a compiler recognizes a particular lvalue as "freshly visibly derived" from another is a Quality of Implementation issue outside the Standard's jurisdiction.