Hacker News new | ask | show | jobs
by ChrisSD 2485 days ago
> As I have said previously, memory is like a huge array with (say) 0xffffffff elements. A pointer in C is an index to this array. Thus when a C pointer is 0xefffe034, it points to the 0xefffe035th element in the memory array (memory being indexed starting with zero).

I'm not sure how true this is outside of a particular platform/compiler. As far as I'm aware, C doesn't actually define how pointers are represented, only that they are a reference to memory (although null is a special case). Pointers in C are very abstract which allows for much more aggressive optimisations.

And all this is before we get into how memory actually works in practice, such as CPU cache lines.

4 comments

You do have to be able to cast from a pointer to an appropriately-sized integer and back, however [1]. This makes the semantics fuzzy and ill-defined in some cases [2].

[1]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2263.htm#q3...

[2]: https://blog.regehr.org/archives/1621

Here's a real-word C compiler where the sizeof() everything is 1; https://github.com/vsedach/Vacietis

For another example, the LLVM webassembly backend doesn't put the call stack in the same address space as the heap at all.

Indeed and it was really fun to work with pointers for programs targeting 16 bit (real mode) MS DOS.
If you thought segmented memory was weird, then try something like an 8051 (3-byte "generic" pointers, stored in semi-big-endian order) or other Harvard-architecture microcontroller.
Yes you can implement C in other ways (I've worked on a C JIT that abstracts from this flat memory model, for example) but come on we all know this is how C works on most machines most of the time and they shouldn't need to add a lot of disclaimers that it could theoretically be done a different way when they're just trying to raise awareness of how things work in practice.
The issue is that C does not work that way on modern machines. Not that old Alpha machines had doubleword aligned pointer and no byte or word load instructions. So indexes into the array had to be multiples of 4. More important, aliasing rules preclude treating memory like one big array: https://gist.github.com/shafik/848ae25ee209f698763cffee272a5.... C99 and newer go to some lengths to permit the optimizer for treat pointers as pointing into disjoint byte ranges (which allows the optimizer to assume they cannot alias). Accordingly the mental model of a big array of memory is, at least for C, generally unsound.
Nitpick: The same rules were also present in C89.
Thanks for the correction!