Hacker News new | ask | show | jobs
by crotchfire 811 days ago
What about DIMMs with Error Correction Codes (ECC)? Previous work on DDR3 showed that ECC cannot provide protection against Rowhammer.

This is incredibly misleading. The paper they cite states:

When the ECC detection is used correctly 0.65%-7.42% of all bit flips still cause silent corruptions... On setup AMD-1, uncorrectable errors crash the system.

The attacker will need to cause dozens of machine halts in order to achieve even a single exploitable bitflip. Dozens of machine halts is not something that goes undetected.

Kudos for calling out JEDEC's terrible behavior on the rowhammer question, but we should not be downplaying ECC as a near-term solution.

5 comments

> The attacker will need to cause dozens of machine halts in order to achieve even a single exploitable bitflip. Dozens of machine halts is not something that goes undetected.

Is there a process for the operations team managing the system to figure out that it was an attack and not just flaky hardware?

Memory bit flips are very rare.

Normally a memory error does not happen more than a few times per year, unless you have a huge amount of memory.

Therefore when 2 memory correctable or uncorrectable errors happen in the same day, that should be enough to trigger an immediate report to the user or administrator of the computer that either there is an ongoing RowHammer attack that must be stopped or one of the memory modules is approaching its end-of-life due to aging and it must be replaced before it will begin to have very frequent memory errors.

At least on server computers it should be easy to configure their logging system so that a second memory error per day, even if it was correctable, should immediately send an e-mail message and/or an SMS to the administrator.

If that's the case, then I guess they would take physical server offline. And if other machines started showing similar signs of failure, then they would analyze the logs for possible row hammer attack?
Sure: you replace the hardware with brand new hardware and it keeps happening. Then you know it's not the hardware.
The same workload starts crashing after migrating to multiple machines?
Sounds like a process thing that would need to be developed by each team. So probably a mix of results there.
> The attacker will need to cause dozens of machine halts in order to achieve even a single exploitable bitflip. Dozens of machine halts is not something that goes undetected.

If you're targeting a specific machine, if you're throwing the exploit at a few thousand machines shotgun style then you're still going to get your botnet - it'll just be smaller.

Can you point to any botnets which were built using rowhammer attacks?

Rowhammer and speculative execution attacks are incredibly labor-intensive and target-specific. They are targeted attacks for high-value targets.

I think the point is that people with thousands of machines are probably going to notice if a meaningful chunk of them start halting.
Yep, and desktop users will certainly notice. Only AMD has desktop (not workstation) ECC support.
If you are running windows 10 random halts and the CPU getting hot won't seem suspicious.
Why do you need to target one person who has thousands of machines? What if I just want to pwn whatever random machines visit my dodgy website? Dismissing an exploit just because it only works some fraction of the time seems overly optimistic to me.
Thanks for this. One reason I bought ECC for my home desktop was specifically for protection against Rowhammer (Zen2 TR platform), and that line made my heart race a bit. Very misleading.
Any recommendations for client devices with ECC memory?
If it has ECC memory, it's going to be branded as a workstation or server or industrial device, not marketed as a consumer device.

Among consumer products, some AMD desktop CPUs and motherboards support ECC memory, and that's about it.

For desktops, ASRock motherboards seem to be the common choice for people wanting ECC memory.

It's specifically mentioned on the ASRock motherboard pages under "Specifications". Some random examples:

https://www.asrock.com/mb/AMD/B650%20Pro%20RS/index.asp#Spec...

https://pg.asrock.com/mb/AMD/B650%20PG%20Lightning%20WiFi/in...

https://www.asrock.com/mb/AMD/X670E%20Taichi/index.asp#Speci...

These all have:

    Supports DDR5 ECC/non-ECC, un-buffered memory up to 7200+(OC)
I think it's worth investigating the level of "support" these boards offer for ECC. The ASRock Taichi for example does not have any ECC DIMMs in its "qualified" list.
Interesting. Might be good for someone (not me!) to investigate then write in-depth info about. :)

As a data point, I'm using a previous generation ASRock AM4 motherboard with ECC and that definitely works.

I'm undervolting my cpu and ram, and very occasionally (every 6 months or so?) one of those seems to be generating a correctable ECC error that gets propagated to warning messages on my terminal. Haven't bothered investigating any further though. ;)

The laptops with ECC memory are expensive and they are available for now only with Intel CPUs (while it should be possible to use mobile AMD CPUs I have never seen any such product). They are sold as "mobile workstations" by Dell, Lenovo and HP. I have a Dell Precision mobile workstation laptop with ECC memory bought in 2016 and it still works fine. However I had to pay for it EUR 3000 in 2016 and now something similar would be even more expensive (it had an NVIDIA Quadro GPU and 32 GB of ECC memory).

For desktops it is much easier to choose ECC memory, because the additional cost (the cost of the memory modules is 50% higher for DDR5-4800) remains a small fraction of the cost of an entire computer.

What is needed is to buy a motherboard with ECC support.

An example of a good motherboard with ECC support is ASUS PRIME X670E-PRO WIFI (for AMD Ryzen). I have been using a similar ASUS motherboard with ECC memory from the previous X570 generation for the last 5 years and it still works fine.

There are several other such MBs, mainly at ASUS and ASRock.

For Intel Raptor Lake there are fewer and more expensive such motherboards, but they can be found at ASUS (Pro WS W680M-ACE SE) and at Supermicro, as "workstation motherboards".

--
It will detect (by crashing) enough to make exploitation impractical. That is the key point.
I would say that 60% success per trial is a good chance.
In the process of generating one triple flip, many, many, many, many, many single and double flips will occur and will be caught. That is why ECC is still an effective defense. Attackers don't just get to go straight to their end game.
You can cause any amount of single and double flip without worry. It's not a defence as the attacker can retry till ECC labels it as uncorrectable. AFAIK there is no cost in retrying.
That's true, but none of it is silent. Corrected errors get reported and it will be obvious that something is going wrong to anyone who's paying attention.
--
The ECCploit paper has extensive discussion of all the ways their work is detected, and how they even use detection to probe the correction structure. This is not a silent attack. This is a proof that ECC is a penetrable defense. Which we all know! The question is how difficult it is and how stealthily it can be done.

But regardless, ECC still sounds the alarm when it's being attacked. If no one listens, there's not much ECC can do about that.

That's true for encryption too.