| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nedbat 878 days ago
	Help us understand: what is the model of memory in C?

4 comments

cornstalks 878 days ago

C supports systems with segmented memory (https://en.m.wikipedia.org/wiki/Memory_segmentation). You also aren’t allowed to observe (or even create pointers which aren’t even dereferenced) memory or addresses outside of allocations you’ve made, so you really can’t observe much about the memory model.

But I think the OP’s point is more about how when you allocate memory, you get an array of bytes to play with. And all higher level languages build on top of that and abstract it away as much as they can.

AnimalMuppet 878 days ago

Define "allowed".

Will I get arrested? No.

Will the compiler stop me? Also no.

Will the program crash? Maybe. Almost certainly if I do it often, or without understanding.

You aren't guaranteed to be safe if you access memory or addresses outside of allocations you've made (with stack and static memory counting as "allocations you've made).

But on embedded systems with memory-mapped I/O, I have done things like

  *(unsigned long *)0xFFFE1404 = 0x00011472;

in order to write values to the registers of a peripheral device. Those I/O registers were memory that I "owned", even though I never allocated it in any way.

cornstalks 878 days ago

The C standard does not allow it. It’s simply undefined behavior under the standard, and all bets are off as far as C is concerned.

But of course an implementation is free to define additional behaviors beyond the C specification. That’s done all the time. But that’s really a “flavor” of C and not pure “vanilla” C.

trealira 878 days ago

No, it's implementation-defined, not undefined behavior. That means the compiler must document a consistent behaviour. From 6.3.2.3 [ISO/IEC 9899:2011]:

An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

Also, the C standard merely codified existing practices and common extensions. Actual use of C has converted integers to pointers for a long time. If converting integer literals into pointers were undefined behavior, it would just show that the C standard isn't being practically useful in one area (since it's commonly done in practice).

AnimalMuppet 878 days ago

> Actual use of C has converted integers to pointers for a long time.

Quite probably since the first C-based implementation of Unix on a PDP-11. So it's been known to be "a thing that C does" for quite a bit longer than the standard existed.

xigoi 878 days ago

If you want your program to work on more than one architecture+compiler combination, then implementation-defined is effectively the same as undefined.

AnimalMuppet 878 days ago

Not entirely.

Say I'm doing my original example, working on an embedded system. My code isn't going to port to anything that doesn't have the same hardware, so architecture isn't an issue. Any compiler supporting that architecture for an embedded application is going to do the right thing with that kind of C statement, so that isn't an issue either.

So, implementation-defined means that you can't count on compilers doing the same thing. But there are some things that are implementation-defined where you can pretty much count on any compiler doing the same thing. And, as trealira said, compilers are supposed to clearly state what they do in such cases, so you can read the compiler's statement and see if there are any surprises.

This is less true if you're writing library code. There, you have to support all compilers, or at least all conforming ones, and you have to make fewer assumptions.

trealira 878 days ago

You probably know this, but on C compilers targeting the 16-bit x86 CPUs, which have segmented memory, you have near and far pointers to accurately reflect its memory model. Near pointers are 16-bit offsets into the current segment, whereas far pointers consist of a 16-bit segment selector and a 16-bit offset.

https://en.wikipedia.org/wiki/Far_pointer

https://en.wikipedia.org/wiki/X86_memory_models

However, this is not standard C.

deaddodo 878 days ago

Memory is a term describing the the semi-permanent (permanent while it receives power) state of data as it exists in fast storage. If you try to access random areas of said memory, you will be sorely disappointed as the OS instead allocates you memory as needed and barriers tasks from accessing each other's memory.

In other words, location 0x0000ffff in your application does not map to system location 0x0000ffff, but instead a translated portion of said block. In addition, there are no guarantees as to how that memory will be ordered/allocated/segmented outside of specific requests for a contiguous block of memory via something like malloc. You can assume your array (static and dynamic) is contiguous, but that's the only assumption you can make.

vacuity 878 days ago

I wouldn't call that the C memory model, as it's imposed by the OS on all programs.

deaddodo 878 days ago

C requires an OS to provide CRT and syscalls for it to function. So you're correct in that the memory model is more akin to "what the OS decides to offer you", with the singular guarantee that specific pieces of data will be contiguous.

If you were to write freestanding C code, this certainly changes. At which point the memory model becomes "what you decide to provide".

yjftsjthsd-h 878 days ago

> C requires an OS to provide CRT and syscalls for it to function.

That doesn't make sense; the OS itself is written in C.

> If you were to write freestanding C code, this certainly changes. At which point the memory model becomes "what you decide to provide".

Okay, so we're back to, C doesn't impose anything on you, though the OS may.

vacuity 878 days ago

The C memory model is what imposes pointer provenance and all that jazz on C programs. The OS provides an abstraction over physical memory, and programming languages abstract over individual address spaces.

deaddodo 878 days ago

> That doesn't make sense; the OS itself is written in C.

Please point to a non-userspace memory allocation library/implementation that does not rely on lower level logic to function.

> Okay, so we're back to, C doesn't impose anything on you, though the OS may.

You're just reversing the original statement.

The statement was "you can't assume anything about memory in C" (paraphrasing). They then asked "why not?"

The explanation is that:

What you think is 'memory' in C isn't, and certainly doesn't map to what most people assume about memory; because C doesn't impose a memory model, it relies on the underlying OS/environment to do so. The only "memory model" is thus: a requested allocation (whether on the stack or heap), if provided to you, will match your request; all other assumptions are invalid.

If you want to move goalposts, argue about logical boundaries, etc, have at it. Or if that answer doesn't satisfy you, I don't know what to tell you; but simply rephrasing the original problem does even less.

nedbat 878 days ago

Except you can't see that model from (for example) Python, and you don't need to.

vacuity 878 days ago

Python presents an abstraction, but underneath, it's bound by the address space just as C is.

nedbat 878 days ago

Yes, but why do I need to know that as a Python programmer? What mistakes might I make if I don't know about it?

vacuity 878 days ago

I was just making it clear that the OS policy pervades. That is, the developers of CPython or another execution context need to care about it, and so it informs some of the implementation details that may leak into programs. We agree that Python presents an abstraction so that Python programmers (usually) don't have to see this.

jacurtis 878 days ago

I'm not sure that abstraction is even in the OS. I think its on the memory controller itself. The OS reads from the memory controller.

yjftsjthsd-h 878 days ago

I'm pretty sure it's more of a cooperative arrangement; the OS writes the configuration that the MMU uses, thereby imposing its rules on programs.

vacuity 878 days ago

The OS controls the MMU.

yjftsjthsd-h 878 days ago

I mean, sure, that's also a valid description. If we're to be precise, at least on x86 the OS sets up a global descriptor table[0] and then uses the LGDT instruction to tell the CPU to use it[1], and LGDT is a privileged instruction so only the OS (kernel) can do that.

[0] https://wiki.osdev.org/Global_Descriptor_Table

[1] https://wiki.osdev.org/GDT_Tutorial#Telling_the_CPU_Where_th...

camgunz 877 days ago

This isn't really responsive to the question. The C memory model is basically defined in stdatomic.h [0]. It's around 10 pages and describes affordances you almost never, ever see.

> In addition, there are no guarantees as to how that memory will be ordered/allocated/segmented outside of specific requests for a contiguous block of memory via something like malloc. You can assume your array (static and dynamic) is contiguous...

The standard can only be referring to the abstract machine. In truth, your OS might give you 1,000,000 elements on 1 page, and 1,000,000 elements on a different page, which exist nowhere near each other in RAM (or have been swapped to disk, etc. etc.), or are being CoW'd into existence, and so on.

This is empirically true--from the days when programs like Chromium would try and malloc all the memory in a machine. The pointer that malloc returned could not have referred to a contiguous memory block. You can try it on your machine by malloc'ing more memory than you have and then reading from the blob.

[0]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf#s...

deaddodo 871 days ago

> The standard can only be referring to the abstract machine. In truth, your OS might give you 1,000,000 elements on 1 page, and 1,000,000 elements on a different page, which exist nowhere near each other in RAM (or have been swapped to disk, etc. etc.), or are being CoW'd into existence, and so on.

This is where the "model" portion of the statement comes into play and why OPs point is even more cogent. The memory will be virtually contiguous, even if it's physically disparate.

camgunz 871 days ago

> The memory will be virtually contiguous, even if it's physically disparate.

I don't really understand the value of extolling the fact that like, you can incrementally iterate through an array in C. The days of systems with segmented memory are long behind us, and even then it wasn't like you'd have an array that spanned segments--you couldn't! Honestly what language/platform exists that doesn't have this property and what would that even look like? Like you'd somehow have to know that elements 100-200 in an array are "no good" and you have to skip them?

People keep trying to put meat on the very basic bones you've got here, but you keep insisting that, yep, 2 comes after 1 and 717 comes after 716. Great! We know! And we say stuff about how trying to index off the end of an array leads to UB, or how the array may not actually be contiguous in physical memory, or it might be in various caches, or how sometimes your array goes from 1024 members to 1025 members and your FPS drops from 300 to 30 because you fell out of cache, and you're like, "sure but you still access arrays with consecutive indexes". Well, yeah! It's kind of the point of using C that you have access to or some control over these kinds of things. They're useful to the conversation. Continually bringing us back to indexing... I don't think is.

> This is where the "model" portion of the statement comes into play and why OPs point is even more cogent.

Eh, "memory model" is a specific phrase referring to how memory is defined to work in a threaded environment [0]. The original "Help us understand: what is the model of memory in C?" prompt is referring to the fact that unless you prolifically use the API in stdatomic.h (which very few things do), you just have undefined behavior all over the place if you ever dare to use anything related to threads. Also, it was only defined in C11--not a lot of things have updated, even now.

---

Overall I want to emphasize that this whole thread is doing a real good job of proving Ned Batchelder's point: even people who think they know or understand C don't (to be clear, I do not think I understand C), and things are generally alright. There's--clearly, reading through everything--a culture of "You need to be an expert in some low-level, 'real' language/platform before you can write meaningful software", but you don't, and my evidence is Facebook, maybe the most influential software ever written.

[0]: https://research.swtch.com/plmm

floobertoober 878 days ago

Aliasing is one example of the huge-array-of-bytes abstraction being broken

vacuity 878 days ago

A very simplified and possibly wrong answer: memory in C is a bunch of arrays of bytes that have lots of rules on when and how you can use a particular array given the state of all relevant arrays.