| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bignerd_95 279 days ago

As someone who teaches this stuff at university, I see students getting confused every single year by how textbooks draw memory. The problem is mostly visual, not conceptual.

Most diagrams in books and slides use an old hardware-centric convention: they draw higher addresses at the top of the page and lower addresses at the bottom. People sometimes justify this with an analogy like “floors in a building go up,” so address 0x7fffffffe000 is drawn “higher” than 0x400000.

But this is backwards from how humans read almost everything today. When you look at code in VS Code or any other IDE, line 1 is at the top, then line 2 is below it, then 3, 4, etc. Numbers go up as you go down. Your brain learns: “down = bigger index.”

Memory in a real Linux process actually matches the VS Code model much more closely than the textbook diagrams suggest.

You can see it yourself with:

cat /proc/$$/maps

(pick any PID instead of $$).

...

[0x00000000] lower addresses

...

[0x00620000] HEAP start

[0x00643000] HEAP extended ↓ (more allocations => higher addresses)

...

[0x7ffd8c3f7000] STACK top (<- stack pointer)

                  ↑ the stack pointer starts here and moves upward

                  (toward lower addresses) when you push

[0x7ffd8c418000] STACK start

...

[0xffffffffff600000] higher addresses

...

The output is printed from low addresses to high addresses. At the top of the output you'll usually see the binary, shared libs, heap, etc. Those all live at lower virtual addresses. Farther down in the output you'll eventually see the stack, which lives at a higher virtual address. In other words: as you scroll down, the addresses get bigger. Exactly like scrolling down in an editor gives you bigger line numbers.

The phrases “the heap grows up” and “the stack grows down” aren't wrong. They're just describing what happens to the numeric addresses: the heap expands toward higher addresses, and the stack moves into lower addresses.

The real problem is how we draw it. We label “up” on the page as “higher address,” which is the opposite of how people read code or even how /proc/<pid>/maps is printed. So students have to mentally flip the diagram before they can even think about what the stack and heap are doing.

If we just drew memory like an editor (low addresses at the top, high addresses further down) it would click instantly. Scroll down, addresses go up, and the stack sits at the bottom. At that point it’s no longer “the stack grows down”: it’s just the stack pointer being decremented, moving to lower addresses (which, in the diagram, means moving upward).

3 comments

krackers 279 days ago

The stack does grow down though no matter what, in the sense that the pushing decrements the stack pointer. You can represent this as "up" in your diagram, but I don't think this makes it any easier conceptually because by analogy to a simple push/pop on an array, you'd naively expect higher addresses to contain more recent stack contents.

The core of the issue is that the direction stack growth differs from "usual" memory access patterns which usually allocate from lower to higher addresses (consider array access, or how strings are laid out in memory. And little-endian systems are the majority)

But if we're going with visualization options I prefer to visualize it horizontally, with lower addresses on left. This has a natural correspondence with how you access an array or lay out strings in memory.

link

bignerd_95 278 days ago

Please try to draw, step by step, a process where lower addresses are at the top and higher addresses are at the bottom. You’ll see that this makes everything much easier to understand.

Do not confuse this with push and pop on an abstract stack data structure. That is not the same as the process stack. On a real process stack, newer data is stored at LOWER addresses. In fact, every push decrements the stack pointer (the address is decreased).

If you want an example, think about how a string is placed and accessed on the stack. First, the stack pointer is decremented to reserve space (so in my diagram this “moves up” visually). Then the string can be read byte by byte by incrementing an index from the lower address toward the higher address. This is exactly like reading a book: left to right, top to bottom. If you flip memory upside down, everything becomes unnatural to understand: you would have to read the string from the bottom to the top.

Try decompiling a program with Ghidra. Open the disassembly view and look at the addresses on the left. Lower addresses are shown at the top. Higher addresses are shown at the bottom. In my diagram this matches perfectly. Everything is consistent and you never have to mentally flip the memory layout.

Years of practice led me to this, not just theory.

link

amitprasad 279 days ago

I think I got stuck in the same rut that I learned address space in whilst writing that diagram. I would tend to agree with you that your model makes much more sense to the student.

Related: In notation, one thing that I used to struggle with is how addresses (e.g. 0xAB_CD) actually have the bit representation of [0xCD, 0xAB]. Wonder if there's a common way to address that?

link

bignerd_95 279 days ago

If you're referring to little-endianness, it means the CPU stores multi-byte values in memory with the least significant byte first (at the lowest address).

This convention started on early Intel chips and was kept for backward compatibility. It also has a practical benefit: it makes basic arithmetic and type widening cheaper in hardware. The "low" part of the value is always at the base address, so the CPU can load 8 bits, then 16 bits, then 32 bits, etc. starting from the same address without extra offset math.

So when you say an address like 0xABCD shows up in memory as [0xCD, 0xAB] byte-by-byte, that's not the address being "reversed". That's just the little-endian in-memory layout of that numeric value.

There are also big-endian architectures, where the most significant byte is stored at the lowest address. That matches how humans usually write numbers (0xABCD in memory as [0xAB, 0xCD]). But most mainstream desktop/server CPUs today are little-endian, so you mostly see the little-endian view.

link

amitprasad 279 days ago

Not so much the confusion of what little endian is, but how we tend to represent it in notation. Of course this confusion was back when I was first learning things in high school, but I imagine I’m not alone in it

link

bignerd_95 279 days ago

Yes, I reached the same conclusions the hard way while exploiting memory corruption bugs. Once I understood how misleading these representations can be, everything finally became clear.

About the address notation you're describing, I'm not sure I fully get the problem. Can you spell out the question with a concrete example?

This is what the address space of a real bash process looks like on my machine:

$ cat /proc/$(pidof bash)/maps

5e6e8fd0f000-5e6e8fd3f000 r--p 00000000 fc:00 3539412 /usr/bin/bash

5e6e8fd3f000-5e6e8fe2e000 r-xp 00030000 fc:00 3539412 /usr/bin/bash

5e6e8fe2e000-5e6e8fe63000 r--p 0011f000 fc:00 3539412 /usr/bin/bash

5e6e8fe63000-5e6e8fe67000 r--p 00154000 fc:00 3539412 /usr/bin/bash

5e6e8fe67000-5e6e8fe70000 rw-p 00158000 fc:00 3539412 /usr/bin/bash

5e6e8fe70000-5e6e8fe7b000 rw-p 00000000 00:00 0

5e6e94891000-5e6e94a1e000 rw-p 00000000 00:00 0 [heap]

7ec3d1400000-7ec3d16eb000 r--p 00000000 fc:00 3550901 /usr/lib/locale/locale-archive

7ec3d1800000-7ec3d1828000 r--p 00000000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1828000-7ec3d19b0000 r-xp 00028000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d19b0000-7ec3d19ff000 r--p 001b0000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d19ff000-7ec3d1a03000 r--p 001fe000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1a03000-7ec3d1a05000 rw-p 00202000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1a05000-7ec3d1a12000 rw-p 00000000 00:00 0

7ec3d1a2b000-7ec3d1a84000 r--p 00000000 fc:00 3549063 /usr/lib/locale/C.utf8/LC_CTYPE

7ec3d1a84000-7ec3d1a85000 r--p 00000000 fc:00 3549069 /usr/lib/locale/C.utf8/LC_NUMERIC

7ec3d1a85000-7ec3d1a86000 r--p 00000000 fc:00 3549072 /usr/lib/locale/C.utf8/LC_TIME

7ec3d1a86000-7ec3d1a87000 r--p 00000000 fc:00 3549062 /usr/lib/locale/C.utf8/LC_COLLATE

7ec3d1a87000-7ec3d1a88000 r--p 00000000 fc:00 3549067 /usr/lib/locale/C.utf8/LC_MONETARY

7ec3d1a88000-7ec3d1a89000 r--p 00000000 fc:00 3549066 /usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES

7ec3d1a89000-7ec3d1a8a000 r--p 00000000 fc:00 3549070 /usr/lib/locale/C.utf8/LC_PAPER

7ec3d1a8a000-7ec3d1a8b000 r--p 00000000 fc:00 3549068 /usr/lib/locale/C.utf8/LC_NAME

7ec3d1a8b000-7ec3d1a8c000 r--p 00000000 fc:00 3549061 /usr/lib/locale/C.utf8/LC_ADDRESS

7ec3d1a8c000-7ec3d1a8d000 r--p 00000000 fc:00 3549071 /usr/lib/locale/C.utf8/LC_TELEPHONE

7ec3d1a8d000-7ec3d1a90000 rw-p 00000000 00:00 0

7ec3d1a90000-7ec3d1a9e000 r--p 00000000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1a9e000-7ec3d1ab1000 r-xp 0000e000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ab1000-7ec3d1abf000 r--p 00021000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1abf000-7ec3d1ac3000 r--p 0002e000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ac3000-7ec3d1ac4000 rw-p 00032000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ac4000-7ec3d1ac5000 r--p 00000000 fc:00 3549065 /usr/lib/locale/C.utf8/LC_MEASUREMENT

7ec3d1ac5000-7ec3d1ac6000 r--p 00000000 fc:00 3549064 /usr/lib/locale/C.utf8/LC_IDENTIFICATION

7ec3d1ac6000-7ec3d1acd000 r--s 00000000 fc:00 3548984 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache

7ec3d1acd000-7ec3d1acf000 rw-p 00000000 00:00 0

7ec3d1acf000-7ec3d1ad0000 r--p 00000000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1ad0000-7ec3d1afb000 r-xp 00001000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1afb000-7ec3d1b05000 r--p 0002c000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1b05000-7ec3d1b07000 r--p 00036000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1b07000-7ec3d1b09000 rw-p 00038000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ffd266f8000-7ffd26719000 rw-p 00000000 00:00 0 [stack]

7ffd2678a000-7ffd2678e000 r--p 00000000 00:00 0 [vvar]

7ffd2678e000-7ffd26790000 r-xp 00000000 00:00 0 [vdso]

ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]

___

Each line is a memory mapping. The first field is the start address. The second field is the end address. So an entry like

7ffd266f8000-7ffd26719000

means "this mapping covers virtual addresses from 0x7ffd266f8000 up to 0x7ffd26719000."

The addresses are always increasing:

- left to right: within a single line you go from lower address to higher address

- top to bottom: as you go down the list you also go to higher and higher addresses

Exactly like reading a book: left to right and then top to bottom.

link

1718627440 279 days ago

The issue amitprasad is pointing out is when you read addresses byte-wise and you determine that they are in little-endian.

link

1718627440 279 days ago

That's how stacks on my desk grow and how everything grows in reality. I wouldn't numerate stacked things on my desk from the top, since this constantly changes. You also wouldn't name the first branch of a tree (the plant) to be the top-most one.

In your example "the stack grows down", seems to be wrong in the image.

link

bignerd_95 279 days ago

Thanks! I tried to rewrite the final sentence

link

1718627440 279 days ago

Yeah, but does that really help? The phrases "growing down/up" still exist and now you defined them to mean the opposite. This issue still didn't go away, since heap and stack still grow in different directions. Can't you just start drawing from the bottom of the blackboard, and it will be obvious? Coordinate systems also typically work that way.

link

bignerd_95 279 days ago

Yes, I draw the heap starting at the top of the board and the stack starting at the bottom of the board and grow them toward each other. That works fine in a one-off explanation.

The problem is that most textbooks draw the opposite, so the student leaves my lecture, opens a book or a slide deck, and now “down” means a different thing.

It gets worse when they get curious and look at a real process with /proc/<pid>/maps. Linux prints mappings from low address to high address as you scroll down (which matches my representation). That is literally reversed from the usual textbook diagram. Students notice and ask why the book is “wrong.”

So I've learned I have to explicitly call this out as notation.

Same story as in electronics class still teaching conventional current flow (positive to negative), even though electrons move the other way (negative to positive). Source: https://www.allaboutcircuits.com/textbook/direct-current/chp.... Historical convention, and then pedagogy has to patch it forever.

link

ofalkaed 279 days ago

Starting at the bottom of the blackboard would be backwards from how it prints in the terminal when you cat /proc/<pid>/maps.

link

1718627440 278 days ago

The way it's printed in the terminal is honestly backwards to me. This seams to only come from the scroll direction of the terminal, and because this is not a drawing, but a simple list. Every other tool like a debugger show it in the opposite direction and in all illustrations I have read it's that way too.

link

bignerd_95 278 days ago

So you use debuggers. Good. Then you can confirm that the program counter is incremented after each instruction, and that you read assembly from top to bottom. That means smaller addresses are at the top and larger addresses are at the bottom. This matches my layout, and it also matches what you see in the terminal in /proc/<pid>/maps.

link