Hacker News new | ask | show | jobs
by jerf 3083 days ago
That's bad. A single 5% hit might not be the end of the world, but 5% here and 10% there and another 5% over there in the common case adds up badly enough. Doubly-pathological cases (indirect calling-heavy code calling lots of syscalls)... a 50% slowdown and a 30% slowdown combines to a 60% total slowdown. Yeowch.

Will be intrigued to see how processor manufacturers respond to this. If they were even slightly relaxed about it prior to disclosure I expect there's going to be some very hurried attempts to engineer some solutions pronto. This is the sort of thing where it might even be worth throwing away all of your future roadmap plans and just getting a revision of the current chips out there ASAP, whatever that may do to the rest of your roadmap.

1 comments

Sounds like it could be great for processor manufacturers. In the age where CPUs don't get faster, there's finally a reason for customers to buy new CPUs again!
Not really, once a program is compiled with -retpoline, new hardware won't bring back reliable branch prediction.

I'd hope maybe, just maybe, this would be enough to put a focus on compilers producing code that ends up using processor-optimized paths chosen at runtime, to avoid "overheads ranging from 10% to 50%".

Though, in this case, that would essentially mean making the entire executable region writable for some window of time, which is clearly too dangerous, so I guess the 0.1% speedups from compiling undefined behavior in new and interesting ways, will continue taking priority.

I mean, it's a compiler flag right, obviously whoever's going to run a program on an unaffected platform will take the effort to recompile everything with the flag removed.

Just the same way every serious application currently provides different executables for running on systems where SSE2, SSE4.1, or AVX2 is present.

Horizontal scaling though. If every individual processor is slower, more are needed.
Not quite - lots of "serious" applications these days are written to target JIT compilers, which would be capable of switching retpoline on and off depending on need.
Funnily enough, I ended up not including a PS starting with "A sufficiently smart JIT, however..." ;)
I'd rather have linkers go down a similar road that the Linux kernel went on a over a decade ago: provide binary patches in a table (essentially alternative machine code) and have the linker patch the correct alternative depending on the CPU and it's bugs. The Linux kernel already contains an "alternatives" segment which is exactly this kind of list of patches. It would be trivial to add such a table to ELF and PE formats and have the runtime linker process that while it's plowing through the code anyway.
Something like this exists with function multi-versioning: https://lwn.net/Articles/691932/

For example, glibc chooses optimised machine code for memcpy depending on the CPU it runs on.

New CPUs could just convert the retpoline back to the original jump in microcode, and enable the now timing-attack safe branch predictor.
But even then a performance hit remains due to the increased code size of the instruction sequence.
There's a lot of space left in code already to insert trampolines later. And in the end of the day most memory is data, not code.

And eventually, this code will get replaced anyway (just like today there are often multiple code paths in binaries, and a lot of code is compiled for host anyway).

In any case, the performance impact of a couple extra bytes per indirect call is small compared to disabling branch target prediction.

Makes me wonder if that's Intel's PR plan (hence their spin on the story). Assuage the general consumer and mainstream media, point the finger at the OS vendors if necessary, and fix the bug in their next-gen chips.

When this all shakes out, the general story is going to be "upgrade sooner, current-gen Intel chips are x% faster", where x is going to be a larger number than it was a week ago.

It's more-or-less the Apple battery story all over again. Current devices are going to be slower, newer ones are going to be faster. Even if you know the why and the how, you're still in the same place as everybody else (at best, you could upgrade just the chip if your MB is new enough, but you're still buying Intel). Unless there's some clear way of imposing the external cost of this bug on Intel, it's a win-win for them.

Here's another potential PR spin: the major cloud vendors got out ahead of this so that they could point the finger at Intel, making customers not ask "Why did you sell me this fundamentally insecure server for so many years?"

Because this isn't really that new nor is it really a bug. Meltdown could be a bug because the asynchronous access of memory in other protection rings is unsafe, but the rest of it is just a normal side-channel attack, an abuse of otherwise-innocent data.

How much moral culpability should rest on the proprietors of software virtualization technologies that don't really safely encapsulate anything due to hardware incongruencies with the modern "sandboxed computing" model?

Surely there have been engineers over the years who've questioned the propriety of misleading users into seeing VMs as fully-encapsulated systems when the hardware just fundamentally doesn't support that type of native encapsulation. Such persons have probably spent the last several years being shunned for being old fogies overly attached to their rust buckets. It'd be interesting to hear some of their stories.

> Even if you know the why and the how, you're still in the same place as everybody else

If you know then you at least have the option to turn off PTI and compile your performance-critical binaries without retpoline.

And maybe disable microcode updates if intel's updates eat another few percent.

Multicore performance is getting faster at a very high pace.

Ryzen 1700X has 3.3 times the multicore performance of my i5 3450 from 2012 for almost the same price. With 7nm we will see at least another 50% increase in multicore performance on top of that.