Hacker News new | ask | show | jobs
by gnode 2379 days ago
Pointers, as a language concept, don't have to correspond to the addressing schemes of the hardware or ISA. On some architectures instructions may only be able to address aligned whole words. Some microcontrollers (e.g. Intel MCS-51) feature bit-addressable memory. Apparently, there's a special __bit type supported by the Small Device C Compiler for using bit addressable memory on such devices, although I don't know if it has support for taking pointers to these.
4 comments

They do not have to. But then it wouldn't be C, which by design has a straight forward and obvious mapping to the underlying machine.

For example, there are machines (some DSPs) that individual octects are not efficiently addressable and usually a C byte in these machines is 16 or 32 bita.

Pointers are very much a language concept and very much not an architecture concept. I enjoy this particular writeup that touches on some of the distinctions. Of particular interest is the fact that the C standard itself states that two pointers are not equivalent simply by virtue of having the same address value.

https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

I also happen to very much enjoy this piece on how the C abstract machine has very little in common with modern architecture.

https://queue.acm.org/detail.cfm?id=3212479

This exchange was an enjoyable read. C was designed for portability because they had those PSP computers or whatever they were but the problem is that each had its own unique architecture, switch arrangement for operation and maybe even endianess. I don't know. The whole point of the matter was to make a computer language portable enough by a person's desire to write a compiler for the architecture. Why people do not like that I can not comprehend.
They don't have to, but they're commonly understood to refer to memory addresses, which, on most ISAs, are locations of octets.

Even if the ISA only allows word- or dword-aligned loads from memory, the addresses still typically enumerate bytes, not words or dwords.

Based on a quick summary of the MCS-51 that I googled up, it looks like its memory addressing scheme still assigns addresses to bytes, and has special operations that allow you to further specify a bit offset within that memory address.

> it looks like its memory addressing scheme still assigns addresses to bytes, and has special operations that allow you to further specify a bit offset within that memory address.

There are also instructions which use an addressing scheme which takes an 8-bit bit address, with the 0x00 - 0x7f corresponding to lower memory, and 0x80 - 0xff corresponding to 16 specific registers in the Special Function Register set.

The 8051 has bit addressable memory.
Isn't a byte supposed to correspond to the smallest addressable unit of memory?
The original usage of the term "byte" was to refer to fields of variable length consecutive bits on a bit-addressable machine: https://en.wikipedia.org/wiki/Byte#History

Nowadays a byte is conventionally eight bits, especially for measures like "megabyte", but the term octet is often used to avoid ambiguity. Commonly they're used for pointers, yet often only words are addressable by machine instructions (e.g. many ARM instructions take a byte address yet raise a hardware exception on use of unaligned addresses).

Interesting, but I think the notion of a byte in C is different. But I'm not able to look it up at the moment.
It is, but it's defined rather weirdly:

"byte: addressable unit of data storage large enough to hold any member of the basic character set of the execution environment"

Hence why the type that corresponds to it is "char"! Beyond that, the only thing that kinda sorta implies that it's the smallest addressable unit is the definition of CHAR_BIT:

"number of bits for smallest object that is not a bit-field (byte)"

I think in other words what you say is that the C standard defines sizeof(char) = 1; so that 1 is one byte and that char must be one byte however different architectures can have an addressable space of a size different than 8 bits, 1 byte is not always 8 bits.

This might be why the code space alphabet is defined by the standard so it will at least put an emphasis on 8 bits == 1 byte.

C definitely doesn't require bytes to have 8 bits - it only requires them to have at least 8 bits. And there are architectures on which C char has as many bits as int (SHARC).

The question, though, was about whether it's the minimum addressable unit of memory. In the C memory model, it is, but by implication - you can't have two pointers that compare non-equal, but differ by less than 1, so a type with sizeof==1 is by definition the smallest you can uniquely address. However, the C memory model doesn't have to reflect the underlying hardware architecture.

It may well vary depending on which C standard you're talking about. ISO C defines both a byte and a char as at least long enough to contain characters "of the basic character set of the execution environment". They must be uniquely addressable. Although it seems their definitions don't preclude them from being different, or from sub-bytes being uniquely addressable by pointers.