Hacker News new | ask | show | jobs
by tialaramex 1138 days ago
No. The type of 'x' is int. It so happens on your platform (and most available systems today) sizeof(int) == 4.

The type of letter was explicitly char, and sizeof(char) == 1 by definition in C.

char letter = 'x'; is a type coercion. That literal is an integer, with the value 120 and then it's coerced to fit in the char type, which is an 8-bit integer of some sort (might be signed, might not, doesn't matter in this case).

2 comments

People often forget that 'ab' or '1ert' multi-char immediate are allowed in C. They are almost unusable as they are highly un-portable (because of endianess issues between the front-end and the back-end).
This is once in a while kinda useful, aside from the data layout issue for stuff like a FourCC.

Rust has e.g. u32.to_le_bytes() to turn an integer into some (little endian) bytes, but I don't know if there's a trivial way to write the opposite and turn b"1ert" (which is an array of four bytes) into a native integer.

Edited to Add: oh yeah, it has u32.from_le_bytes(b"1ert"). I should have checked.

Does this mean that `word` in

  char *word = "xyz";
is a pointer to an array of four `int`s, `'x'`, `'y'`, `'z'`, and `'\0'`? When I evaluate

  sizeof(*word)
I do get 1 instead of 4, even though `*word` is pointing to `'x'`. Where are the remaining 3 bytes in memory?
A char is 1 byte by definition. But the type of a character literal (the 'x' syntax) is not a char, but an int instead.

The C type system generally matters so little that the type of an expression has little relevance (sizeof is the most notable exception to that rule), which obscures this fact.

Not at all. There are no character literals in "xyz", this is a string literal and it's unrelated to what your parent was saying.
word is of type char*, a pointer to a (single) object of type char.

The initializer means that the char object it points to happens to be the first (0th) element of an array containing 4 elements with values 'x', 'y', 'z', and '\0'.

Most manipulation of arrays in C is done via pointers to the individual elements, and arithmetic on those pointers. (Incrementing a pointer value yields a pointer to the next element in the array.)

For example, `sizeof word` gives you the size of the pointer object, but `strlen(word)` yields 3, because it calls a library function that performs pointer arithmetic to find the trailing '\0' that marks the end of the string. (A "string" in C is a data layout, not a data type.)

If you specifically type it as char * the it's a pointer to chars each of which has size 1.
you'll have to understand the 'x' syntax and the "xyz" syntax as two different things. Different quotes.
I know. But my understanding was that `"xyz"` is an array of characters so that these two would have the same representation in memory:

  char word[] = {'x', 'y', 'z', '\0'};  // sizeof(word) = 4, sizeof(*word) = 1
  char word[] = "xyz";                  // sizeof(word) = 4, sizeof(*word) = 1
What I did not realize was that the above two are not the same as this:

  char *word = "xyz";  // sizeof(word) = 8, sizeof(*word) = 1
The representation of an object is determined by how the object itself is defined.

An initializer doesn't change that. It only affects the value stored in the object when it's created.

A special case exception is that an array object defined with empty square brackets gets its length from the initializer, so

    char word[] = "xyz";
is a shorthand for, and is exactly equivalent to:

    char word[4] = "xyz";
What I see there is that you seem to highlight the difference between using sizeof with an array and sizeof with a pointer, which makes a difference, even if array-decays-to-pointer is a rule in most other contexts.
Right, I am mixing up two things here. You are right that bringing up pointers here is a mistake.

But apart from that, I would expect `{'x', 'y', 'z', '\0'}` to have size 16 rather than size 4 because it consists of four character literals which each have size 4 on my machine.

Maybe do not overthink it. 'x' is called a character literal, but it has the type int.

`{'x', 'y', 'z', '\0'}` does not have a type by itself, but it's valid syntax to use it to initialize various structs and arrays - some of those will have the size you are looking for, depending on which type of array or struct you choose to initialize with that: https://gcc.godbolt.org/z/Tqjq3xzKo

sizeof() returns the number of "units" that something -- an expression or a type -- takes up. What do you think those units are?

They are literally defined as "characters". sizeof(char) is always 1.

Your confusion (besides the pointer thing) is that 'x' is a funny way to write an int, not a char.