Hacker News new | ask | show | jobs
by jreck 2823 days ago
Except it isn't, not really.

Even just the distinction between the stack & heap is wrong. They aren't different things, just different functions called on the otherwise identical memory. It's why things like Go work fine, because the stack isn't special. It's just memory.

malloc & free are also totally divorced from how your program interacts with the OS memory allocator, even. GC'd languages don't necessarily sit on malloc/free, so it's not like that's an underlying building block. It's simply a different building block.

So what are you trying to teach people, and is C really the way to get that concept across? Is the concept even _useful_ to know?

If you want to write fast code, which is what you'll commonly drop to C/C++ to do, then just learning C won't get you any closer to doing that. It won't teach you branch predictors, cache locality, cache lines, prefetching, etc... that are all super critical to going fast. It won't teach you data-oriented design, which is a hugely major thing for things like game engines. It won't teach you anything that matters about modern CPUs. You can _learn_ all that stuff in C, but simply learning C won't get you that knowledge at all. It'll just teach you about pointers and about malloc & free. And about heap corruption. And stack corruption.

5 comments

> Even just the distinction between the stack & heap is wrong. They aren't different things, just different functions called on the otherwise identical memory. It's why things like Go work fine, because the stack isn't special. It's just memory.

To add to this: I have seen people who learned C and thought it to be "close to the metal" genuinely believe that stack memory was faster than heap memory. Not just allocation: they thought that stack and heap memory were somehow different kinds of memory with different performances characteristics.

And the C abstract machine maps just fine to computers where the heap and stack are separate, unrelated address spaces, so this isn't even necessarily mistaken reasoning for someone who just knows C.

Separate stack and data memory adress spaces will make the machine incompatible with ISO C due to impossibility to convert between "pointer to void" and "pointer to object". Code address space is allowed to be separate.
> malloc & free are also totally divorced from how your program interacts with the OS memory allocator, even. GC'd languages don't necessarily sit on malloc/free, so it's not like that's an underlying building block. It's simply a different building block.

The realization that malloc is really just kind of a crappier heavy-manual-hinting-mandatory garbage collector was a real eye-opener in my college's "implement malloc" project unit.

(To clarify: the malloc lib is doing a ton of housekeeping behind the scenes to act as a glue layer between the paging architecture the OS provides and high-resolution, fine-grained byte-range alloction within a program. There's a lot of meat on the bones of questions like sorting memory allocations to make free block reunification possible, when to try reunification vs. keeping a lot of small blocks handy for fast handout on tight loops that have a malloc() call inside of them, how much of the OS-requested memory you reserve for the library itself as memory-bookkeeping overhead [the pointer you get back is probably to the middle of a data structure malloc itself maintains!], minimization of cache misses, etc. That can all be thought of as "garbage collection," in the sense that it prepares used memory for repurposing; Java et. al. just add an additional feature that they keep track of used memory for you without heavy hinting via explicit calls to malloc() and free() about when you're done with a given region and it can be garbage-collected).

Stack accesses _are_ different in hardware these days, which is why AArch64 brings the stack pointer into the ISA level vs AArch32, and why on modern x86 using RSP like a normal register devolves into slow microcoded instructions. There's a huge complex stack engine backing them that does in fact give you better access times averaged vs regular fetches to cache as long as you use it like a stack, with stack-like data access patterns. The current stack frame can almost be thought of as L½.
The stack pointer is just that, a pointer. It points to a region of the heap. It can point anywhere. It's a data structure the assembly knows how to navigate, but it's not some special thing. You can point it anywhere, and change that whenever you want. Just like you can with any other heap-allocated data structure.

It occupies the same L1/L2 cache as any other memory. There's no decreased access times or fetches other than the fact that it just happens to be more consistently in L1 due to access patterns. And this is a very critical aspect of the system, as it also means it page faults like regular memory, allowing the OS to do all sorts of things (grow on demand, various stack protections, etc...)

Google "stack engine". Huge portions of the chip are dedicated to this; if it makes you feel better you can think of it as fully associative store buffers optimized for stack like access. And all of this is completely separate from regular LSUs.

There's a reason why SP was promoted to a first class citizen in AArch64 when they were otherwise removing features like conditional execution.

That's also the reason why using RSP as a GPR on x86 gives you terrible perf compared to the other registers, it flips back and forth between the stack engine and the rest of the core and has to manually synchronize in ucode.

EDIT: Also, the stack is different to the OS generally too. On Linux you throw in the flags MAP_GROWSDOWN | MAP_STACK when building a new stack.

Would it be fair to say that it will teach you how to dictate the memory layout of your program, which is key to taking proper advantage of "cache locality, cache lines, prefetching, etc..."?
> It's why things like Go work fine, because the stack isn't special. It's just memory.

Care to elaborate?