Hacker News new | ask | show | jobs
by lordnacho 1069 days ago
> I used to think "C presents the most honest representation of the low-level mechanisms of the computer", but... even this is shaky. I've been programming for almost 15 years now, and I don't think I've ever seen a computer where memory is actually a continuous array of bits sorted by memory address. The C representation of memory (and all the pointer arithmetic) is not a real representation of your hardware, and this too is an abstraction.

It's true that almost nothing works the way it's presented: the computer doesn't necessarily actually do the instructions you specify, it does its machine commands that are compiled. It also doesn't necessarily even do them in the order they are specified. The memory isn't actually a big continuous space, it's mapped as virtual memory. The actual memory isn't used in that way either, there's a hierarchy of NUMAed caches between the CPUs and the actual memory.

But it's a useful abstraction. Partly because a lot of the above things are built so that the abstraction works. But also because we want it to look that way, and it's kinda natural to let programmers imagine a virtual machine that works that way.

4 comments

More importantly, it's also the abstraction that the CPU itself provides, not C. It'd be neat to be able to control all those things, but that's largely impossible, so I'll take the next best thing.
C presents a fairly honest representation of the low level mechanisms of x86 Assembly. The way Assembly has drifted away from actually CPU instructions is interesting, but not something a programmer will get much benefit from trying to deal with. Itanium was an interesting experiment, but the new set of instructions did not offer large gains in practice.
>>I don't think I've ever seen a computer where memory is actually a continuous array of bits sorted by memory address.

I may be being pedantic or outright wrong (since it's been a while since I used C), but I don't think C can address memory by individual bit.

You have to read one or more bytes from memory, twiddle the bits in them, using C's bitwise operators (like !, &, | and tilde), and then write the changed bytes back to memory at the same addresses you read them from. At least for the earlier C versions I used, this was the case, IIRC.

And to read and write those bytes, you do it via scalar variables like ints or longs, or via structs or arrays, or via pointers. Or using library functions like memset().

Indeed, bytes are the smallest addressable unit, which is 8 bits in most architectures. You can't address a bit, so to do anything with it you have to get the byte it's in and twiddle.
Why do programmers in 2023 need to imagine a virtual machine (basically a PDP-11 from 1970-something) at all?

You only need that abstraction if you're doing low level bit/byte bashing and I/O, or there's some chance you may run out of memory and need to handle that manually.

That applies to a tiny slice of all possible applications.

There are far more useful modern abstractions that don't need to make those assumptions.

> basically a PDP-11 from 1970-something

That PDP-11 from the seventies had ADC/SBC (addition/subtraction with carry) in its instruction set, the result of MUL was twice the size of the inputs (i.e., multiplying two ints produced a long), and DIV produced both the quoitient and the remainder. None of that is visible from C and yet people keep clamoring that "C is close to the metal". Bah, humbug: while " * p++" and " * --p" idioms translate directly into an addressing mode particular for PDP-11 — most other architectures don't have autoincrement/decrements — there is no specific support for " * ++p " or " * p--" in the machine itself.

Yeah that's true, and that's why people don't use C for stuff that isn't close to the metal. If you're just serving some web page you can just think about the business logic and a higher level language will deal with the rest for you.

But someone's got to write drivers and someone's got to write the thing that connects the higher levels to the metal.

Because when you are writing drivers for MCUs, you are writing into arbitrary pieces of memory on arbitrary addresses specified by reference manual for you MCU. And when you will write 0xABCD into memory address 0xF120, then your UART will throw out 0xA, 0xB, 0xC, 0xD on a pin using clocks defined by register 0xF124 which is actually a divider definition from VCO connected to XTAL.

No amount of abstraction under any language will isolate you from such memory model.