Hacker News new | ask | show | jobs
by jcranmer 633 days ago
> 1. It doesn't map almost 1:1 to assembly the way C does, so it's not inherently clear if the code will necessarily do what it says it does.

As someone who works on a C compiler, I will tell you that Rust maps marginally better 1:1 to assembly than C does. No major C compiler goes 1:1 to assembly; it all gets flushed into a compiler IR that happily mangles the code in fun and interesting ways before getting compiled into the assembly you get at the end. Rust code does that too, but at least Rust doesn't pull anything silly on you like the automatic type promotion that C does.

If C maps 1:1 to assembly in your view, then (unsafe) Rust does; if Rust doesn't map 1:1 to assembly, nor does C. It's as simple as that.

1 comments

I get that GCC and Clang does all sorts of optimizations, but doesn't unoptimized C map closely to 1:1?

I've heard it being called a high level assembly that maps closely to assembly many times at this point, it makes sense to me why people would say that.

> If C maps 1:1 to assembly in your view, then (unsafe) Rust does; if Rust doesn't map 1:1 to assembly, nor does C. It's as simple as that.

I thought the mapping issue was unrelated to the borrow checker, and that it's possible to write a borrow checker for a restricted subset of C. I thought the thing that was making it not map 1:1 was actually all of the extra features in Rust, like the ADTs and async and all of that. Is that not actually the case?

> but doesn't unoptimized C map closely to 1:1

What is a variable in C? A register? A memory location? The language doesn't have basic concepts needed to map anything 1:1 to assembly and the ones it has usually come with half a dozen standards worth of required error handling, because having single instruction features like sqrt return -1 on error wasn't enough.

> What is a variable in C? A register? A memory location?

Wouldn't it depend on the type? Something like:

int p; p = &x;

MOV @R1, R2 ; R1 contains the address of x, move it to pointer p in R2

int p; int value = *p;

MOV @R2, R0 ; Dereference pointer p (in R2), load the value into R0 (int value)

int x = 5;

MOV #5, -(SP) ; Push the value 5 onto the stack (stack-allocated int)

int x = 10; int y = x + 5;

MOV #10, R0 ; Load the immediate value 10 into register R0 (for x)

ADD #5, R0 ; Add 5 to the value in R0 (x + 5), store result in R0

or

MOV #10, -(SP) ; Push 10 onto the stack for x

MOV (SP), R0 ; Load x from stack into R0

ADD #5, R0 ; Add 5 to x

Whether a variable gets stack-allocated or register-allocated, it's still a pretty close mapping afaict. From my understanding the original C mapped closely to PDP-7 and then PDP-11 assembly. The original implementation and how it maps to PDP-11 could be used as a reference implementation.

The C standard does not reference the stack anywhere.

Depending on optimization level, things can change. Without any optimizations, variables of “automatic storage duration” such as local variables, may get placed on the stack. But with optimizations turned on, they may end up in a register, or even not be stored anywhere, for example if they’re an integer literal that never gets modified after assignment.

> I get that GCC and Clang does all sorts of optimizations, but doesn't unoptimized C map closely to 1:1?

Nope. There's actually a number of "optimizations" that get applied to "unoptimized C" code. For example, gcc decides to apply even/odd mathematical function laws to the math library functions even with -O0, and both gcc and clang are very happy to throw "unused" code at -O0 that prevented me from doing jump table shenanigans.

C fundamentally has no idea of the distinction between registers and memory, and this is probably the most important distinction in modern assembly languages. It's especially obvious when you get to exotic architectures that have thousands of registers and a relatively thin memory pipe. Making a C compiler get out the assembler that you expected is a lot trickier than you might expect, and when you need exactly some assembly, you'll find that most compiler engineers will tell you "the compiler won't guarantee that, please use assembly" while the people trying to do so often end up spiraling into a rant about how compiler writers are idiots who can't write working compilers because it won't give them the assembly they need.

> I thought the thing that was making it not map 1:1 was actually all of the extra features in Rust, like the ADTs and async and all of that. Is that not actually the case?

People use a variety of different definitions of "map 1:1" that makes it hard to really answer your question for certain. What you seem to be getting at is the notion that C's ABI is predictable. But there are plenty of C features whose mapping to assembly is as unpredictable as Rust's ADT or async features are: C's bitfields are the most notorious example, but I'd throw in variable arguments, atomics, and the new _BitInt into the mix. Which is to say, if you're an engineer for whom this stuff matters, you'll know how the compiler is going to handle these constructions for your targets of interest, but that's not the same as saying that those constructions will always work the same way on all targets.

Unoptimized C is not something anyone actually uses. And it maps less obviously to assembler than C with some optimizations, because C compilers in no-optimization mode generally do brain-dead things like allocating all variables on the stack.

ADTs and such don't actually make the mapping less obvious. Async kinda does, but again it's not hard to have at least some mental model of how an async function will turn into a state machine implementation. C, C++, and Rust are all about equal in terms of how well I can predict how a given function maps to assembly, which is that if I care, I need to check, but I'm rarely completely bamboozled by what I see.