Hacker News new | ask | show | jobs
by mghfreud 1641 days ago
I am lost here, the mentioned bugs are a result of optimizations like speculative execution, branch prediction, prefetching etc.

These are language independent optimizations. For example, any language (that allow for loop like constructs) compiled to intel machine code and executed on intel processor will be exposed to these bugs, it is not C specific. Am I missing anything?

5 comments

> any language (that allow for loop like constructs) compiled to intel machine code and executed on intel processor will be exposed to these bugs, it is not C specific.

Well, that's kind of the point, that Intel/x86 and most/all modern processors implement an abstract machine that's basically made for C, papering over the underlying instruction parallelism and absurdly complex memory model with all kinds of crazy front-end decoder business to allow the CPU to ingest that machine code and pretend like instructions are executed in order, with C-style control flow.

You could (in principle) prevent these sorts of bugs by creating a new kind of machine, but that machine would be incapable of running C software, at least efficiently. There are several ideas being alluded to here, another is the idea of directly exposing the underlying instruction level parallelism -- that's been attempted before in VLIW processors like Intel's Itanium chips. You could make the argument a big part of their problem was at the compiler level, trying to map C-style semantics to the CPU, while maybe a different language/compiler would have extracted more performance.

Trying to summarize the author's idea, modern CPUs have a lot of hidden potential behind a restrictive "virtual machine". If that layer were stripped clean, we could (the idea goes) get more performance and parallelism, and potentially more security, at the cost of compatibility/interoperability with legacy software.

> Intel/x86 and most/all modern processors implement an abstract machine that's basically made for C, papering over the underlying instruction parallelism and absurdly complex memory model with all kinds of crazy front-end decoder business to allow the CPU to ingest that machine code and pretend like instructions are executed in order, with C-style control flow.

The issue is that none of this has to do with C, really at all. C relies on the semantics exposed by the machine code. The ISA does not expose speculative execution or pipelining. C, and asm, and literally all software conforms to the ISA because that's all there is.

Calling out C specifically is just clickbait, IMO. The author makes a great point about how the x86 ISA may not be a great abstraction for modern CPUs.

As mentioned under the older thread of this same article, many hardware APIs suffer from having to provide a C-compatible interface/memory model. I remember reading that a particular GPU’s elegant memory model was butchered so that C-programmers could do something with it? My memory is hazy on the details though.
> I am lost here, the mentioned bugs are a result of optimizations like speculative execution, branch prediction, prefetching etc.

>

> These are language independent optimizations. For example, any language (that allow for loop like constructs) compiled to intel machine code and executed on intel processor will be exposed to these bugs, it is not C specific. Am I missing anything?

1. There's machine code that is exposed to the ISA (public machine code that compilers generate) and there's machine code that exists and is used but is not exposed.

2. The author is making the argument that the machine code that is exposed is designed around the memory model of the C programming language, which itself was designed around the memory model of the PDP.

Put the above two together and (if you squint really hard, and ignore things like logic and reason) the conclusion is that the modern x86/X64 ISA is suboptimal because of the PDP.

The actual reality is that all the popular programming languages are imperative, have the concepts of stack, heap and in-order execution of instructions.

Because all languages appear to converge on the same basic concepts in order to be commonly accepted, I think that it is doubtful that any alternative machine and memory model would have arose in the absence of a language like C or a machine like the PDP.

I think this because of the existence of other languages that offer alternative machine and/or memory models, and those languages have existed for decades without being popular.

> The actual reality is that all the popular programming languages are imperative, have the concepts of stack, heap and in-order execution of instructions. Because all languages appear to converge on the same basic concepts in order to be commonly accepted, I think that it is doubtful that any alternative machine and memory model would have arose in the absence of a language like C or a machine like the PDP.

But as we can see, this model could not keep up with performance improvements so much more complexity got implemented beneath the surface of the old model. The author’s point is to be aware of the mismatch here, and that perhaps we should stop believing the “lies” what C tells us.

I personally believe that we would be much better off with lower level instructions exposed to us, and putting the complexity in software. That way CPU vulnerabilities could be patched, and I believe we could create much better optimizations, and faster CPU design iterations.

At least some of this complexity and abstraction layer provided by the hardware is done so that the same binary can run on multiple different implementations, though. If you expose low level instructions corresponding to the specific CPU implementation then you lose "this app runs on all these Android phones" and also "this process can migrate between CPUs in a big/little setup", which would be unfortunate.
Well, I meant it more in terms of x86 to microcode JIT compilers, but in software. So even existing code can potentially be run in exactly the same way they do know, but instead of cumbersome hardware pipelines, these could be done entirely in software where the complexity ceiling is perhaps somewhat higher. This JIT compiler could do the same “magic” what current CPUs do, reorder, branch predict, etc and even more, while in case of a bug those can be fixed without buying a new processor.
Ah, so a Transmeta style approach? That's certainly feasible, in the sense that their technology worked, but I think it would be tricky at best to match the performance of the more standard do-it-in-hardware approach.
> I personally believe that we would be much better off with lower level instructions exposed to us, and putting the complexity in software

There have been several initiatives that sound vaguely like that, none of which actually worked out commercially. Principally Itanium. At the same time, there is no question that it's possible to gain a lot in performance if you're both willing and able to use a programming environment like CUDA.

It seems to me that the article never actually articulates an alternative in enough detail to take seriously. It doesn't seem to make any falsifiable claim.

In my view the most plausible explanation for why things in this area look the way they look was articulated in DJB's "The Death of Optimizing Compilers" talk. I'm not surprised that this ACM piece was written by somebody that works on optimizing compilers. Perhaps that shouldn't be relevant, but I can't help but suspect that it is.

The bugs are not caused by speculative execution.

The bugs are caused by programmers believing they have control over memory addresses and registers in the CPU only because they are hardcore programmers writing in “low level C”. So when they peek behind the abstraction, the ship now has two captains, one being the programmer and one being the compiler. When both of them hit the gas at the same time all the undefined behavior bugs appear.

A bit extreme analogy but you could compare it to writing code without mutex locks because code will only be executed in single thread, then go multithreading anyway. On an old cpu and old compiler it will work, but once you rev up it will crash and burn. In the case of C, the language spec always catered for this possibility, but many programmers thought they were smarter than the compiler leading to todays bugs.

Other languages neither expose the control nor the temptation to insert such sticks into the machinery. Both sides of the abstraction have a much more mutual understanding of where the border of the language ends. With taller guardrails in place to prevent stepping over it.

> The bugs are not caused by speculative execution.

Actually, they are.

It seems like you're arguing against the existence of any kind of low-level hardware/software interface. C is strongly associated with that interface, but it isn't the same thing.

If hardware is significantly more advanced, but C is largely the same as it was in PDP-11 days, then C now is relatively speaking less able to control the low level compared to back then.

C has decent escape hatches to direct machine control via a simple ABI and even inline assembly, but the behavior of C code itself is arguably under-specified on modern hardware.

Being under-specified towards the hardware is a feature, to allow optimization.

Now I’m not arguing all the UB in C is great, in fact the opposite. But that should be prohibited on the language layer and not on the machine layer. See rust for a better solution.

100% specification would be a mis-feature. But you can't easily use the C language to tell the compiler that your ternary assignment has a 50/50 probability for each branch, meaning it probably should issue an instruction like cmov on x86. You can use non-standard builtins to say it's 99/1 so it should just branch predict, and those builtins are so ubiquitous it's not such a big deal, but still not technically possible with just the standard C language.

In a PDP-11 world, there was no need for such things. In the superscalar predictive world of today, there are a lot of such things you need to specify with non-standard builtins and inline assembly. C gets the job done, but basically ever since the Pentium, it has relied more and more on those non-standard escape hatches, in order to be the best portable assembler. It's still the best compared to other languages, but in absolute terms, it is less good at that task with every hardware generation. Rust doesn't improve on this by the way.

That's the obvious stuff. Compiler writers using UB footguns as one of the most powerful optimization tools is another problem on top of that. You might have to use signed indices into some bytes in order to let the compiler assume your index increment has no overflow (that would be UB), again just to be able to issue cmovs. That's an awkwardly indirect way to do that. Arguably C would be better off specifying addition operators or inline functions for unsigned integers that the programmer promises will never overflow. I don't use Rust, but my understanding is all integer types panic on overflow in debug, and wrap in release -- at least there is a non-stable unchecked_add.

Now I see what you mean by under-specified in the case of branch hints. But does the solution need to involve giving more direct machine access? Similar benefits can be given with higher layer abstractions as well, without adding more UB, in fact the examples below remove UB. Take restrict keyword in C, which theoretically could be automatically inferred in Rust (not sure if they actually do nowadays). Iterators, or as you say better addition operators, can hide the details of array indexing and overflows. Hinting the probability of a certain branch certainly sounds like a higher layer construct as well, not something to expose a direct machine instruction for.
I’m not proposing machine specific additions to C. One of my complaints is that the C language doesn’t have enough features, so I end up using machine specific assembly or compiler specific builtins. And I’m complaining about C pushing developers onto the knife edge of UB or performant code — I don’t want more UB either.

I want higher level constructs in the C language (restrict isn’t a great feature, but it’s high level, so like that) that can map to the actual feature set found on actual machines since 2004 (and compiler writers can do that mapping per machine). But C is stuck with a simplified model of the machine that ignores what almost all hardware can do these days.

I think I’ll always have to dip into assembly/intrinsics sometimes, so I’m not looking for super advanced/rich features. Actually, I think the real benefit would be giving compiler writers ways to improve performance without pushing devs onto the UB knife edge.

Sounds like he's saying that a low level language without invisible abstractions no longer exists for modern CPUs. Thus the calls to rethink both the hardware and language.