Hacker News new | ask | show | jobs
by wolpoli 820 days ago
> The attacker will need to cause dozens of machine halts in order to achieve even a single exploitable bitflip. Dozens of machine halts is not something that goes undetected.

Is there a process for the operations team managing the system to figure out that it was an attack and not just flaky hardware?

4 comments

Memory bit flips are very rare.

Normally a memory error does not happen more than a few times per year, unless you have a huge amount of memory.

Therefore when 2 memory correctable or uncorrectable errors happen in the same day, that should be enough to trigger an immediate report to the user or administrator of the computer that either there is an ongoing RowHammer attack that must be stopped or one of the memory modules is approaching its end-of-life due to aging and it must be replaced before it will begin to have very frequent memory errors.

At least on server computers it should be easy to configure their logging system so that a second memory error per day, even if it was correctable, should immediately send an e-mail message and/or an SMS to the administrator.

If that's the case, then I guess they would take physical server offline. And if other machines started showing similar signs of failure, then they would analyze the logs for possible row hammer attack?
Sure: you replace the hardware with brand new hardware and it keeps happening. Then you know it's not the hardware.
The same workload starts crashing after migrating to multiple machines?
Sounds like a process thing that would need to be developed by each team. So probably a mix of results there.