Hacker News new | ask | show | jobs
by tjoff 2378 days ago
Interesting, but I think the notion of a byte in C is different. But I'm not able to look it up at the moment.
2 comments

It is, but it's defined rather weirdly:

"byte: addressable unit of data storage large enough to hold any member of the basic character set of the execution environment"

Hence why the type that corresponds to it is "char"! Beyond that, the only thing that kinda sorta implies that it's the smallest addressable unit is the definition of CHAR_BIT:

"number of bits for smallest object that is not a bit-field (byte)"

I think in other words what you say is that the C standard defines sizeof(char) = 1; so that 1 is one byte and that char must be one byte however different architectures can have an addressable space of a size different than 8 bits, 1 byte is not always 8 bits.

This might be why the code space alphabet is defined by the standard so it will at least put an emphasis on 8 bits == 1 byte.

C definitely doesn't require bytes to have 8 bits - it only requires them to have at least 8 bits. And there are architectures on which C char has as many bits as int (SHARC).

The question, though, was about whether it's the minimum addressable unit of memory. In the C memory model, it is, but by implication - you can't have two pointers that compare non-equal, but differ by less than 1, so a type with sizeof==1 is by definition the smallest you can uniquely address. However, the C memory model doesn't have to reflect the underlying hardware architecture.

SHARC has no such requirement. Having char and int the same size was not universal. The CPU vendor shipped such a compiler, but that was not the only compiler.

The CPU itself used 32-bit addresses to access machine words, the size of which was determined by what was being accessed. External memory was limited to 32-bit. Internal memory had regions that could be 32-bit, 40-bit, or 48-bit. An address increment of 1 would thus move by that many bits.

Mercury Computer Systems shipped a byte-oriented port of gcc. Pointers to char and short were rotated and XORed as needed to reduce incompatibility. Pointers to larger objects were in the hardware format. This allowed a high degree of compatibility with ordinary software while still running efficiently when working with the larger objects. There was also a 64-bit double, unlike the 32-bit one in the other compiler. Data structures were all compatible with PowerPC and i860, allowing heterogeneous shared memory multiprocessor systems.

You can implement byte addressing on any architecture, of course. That's what I meant by "the C memory model doesn't have to reflect the underlying hardware architecture". But as you point out yourself, this requires pointers which are basically not raw hardware addresses, and which are more expensive to work with, because they require the compiler to do the same kind of stuff it has to do for bit fields. So the natural implementation - with no unexpected perf gotchas - tends towards pointers as raw hardware addresses, and thus char as the smallest unit those can address.
It may well vary depending on which C standard you're talking about. ISO C defines both a byte and a char as at least long enough to contain characters "of the basic character set of the execution environment". They must be uniquely addressable. Although it seems their definitions don't preclude them from being different, or from sub-bytes being uniquely addressable by pointers.