Hacker News new | ask | show | jobs
by exrook 970 days ago
For those interested, the key takeaway from this IMO is that by issuing many sequential reads, the memory controller will hold a target row open for an extended amount of time to service the consecutive accesses.

This is in contrast to the original rowhammer attack, which issues accesses such that target rows are repeatedly opened and closed to trigger bitflips in neighboring rows.

By stretching out the row open time to 30ms (!), the authors claim they are able to reliably trigger bitflips with a single row opening in 13% of tested rows at 50°C[1]. Some rows in certain chips can be flipped with access times of under 10ms[2].

At more realistic row open times of 7.8 - 70us, there seems to be a 1/x relationship between row open time and number of activations required, they cumulative amount of time the row needs to be held open for to trigger a flip seems to remain fairly constant (around 50ms total from my very approximate estimations). Note that the attack needs to be executed in under 64 ms total, otherwise the automatic DRAM refresh will reset any progress made.

The authors demonstrate this attack with a userspace program that maps a 1 GB hugepage to be able to directly manipulate the lower 30 physical address bits[3], although they don't seem to provide the row open times they end up being able to achieve in practice.

The attack code itself: https://github.com/CMU-SAFARI/RowPress/blob/main/demonstrati...

https://arxiv.org/pdf/2306.17061.pdf [1] pg 5. obsv. 2 [2] pg 6. obsv. 6 [3] pg 11. sec 6.1

4 comments

So this is a direct DRAM spec violation: there is a spec in the DRAM datasheet known as tRAS (row address strobe: time from row open (read) to row close (write back)). Min is 33 ns, Max is 9*tREFI. tREFI (average refresh period) depends on temperature: for below 85C, it's 7.8 us. So tRAS max is 70 us. (this is from some random Micron DDR4 datasheet)

Um, so of course they can trigger problems when they violate the spec!

Were they able to find a DRAM controller that violates the spec? If so, that's a simple bug in the DRAM controller. Well I guess so, the paper mentions Intel i5-10400 (Comet Lake). Do AMD processors have this issue?

Most vulnerabilities are spec violations. If there is a real system with this bug, then that system is vulnerable.

The distinction with RowHammer is not if it is a vulnerability, but what component can be blamed for the vulnerability.

I too am not seeing the gotcha here. The paper seems to be:

1. We ran a bunch of DDR4 outside spec directly with an FPGA, and the ram failed to nobody's surprise, and we characterized it.

2. We found a way to bamboozle the CPU's ram controller to achieve a similar effect.

I've written SDR, DDR1/2 ram controllers in verilog, I'm very familiar with autorefresh timing. You need to refresh each row every 64ms, and if you have, say , 1024 rows, then you must issue an autorefresh at least every 62uS. In newer rams there are some allowances to optimize for PVT, but this is a fundamental requirement of DRAM spanning almost 40 years.

In ye olden days of EDO and FPM drams, before synchronous, you had to manually select each row to refresh. Nowadays you just send autorefresh command with no argument. The chip itself maintains a row counter and auto increments it for round robin refresh.

I see 2 potential snags. The first is, JEDEC says that you're sometimes allowed to defer refresh 9 periods. But you do have to refresh more later to make up for it.

The second is if Intel cut corners in their controller. The controller should enforce a hard cutoff after the row stays open too long. The paper mentions this as a potential mitigation (but isn't this simply a hard design rule anyway?) The paper mentions such mitigations would not work because "the row would've been open for too long before refresh anyway". This bafflingly circular logic I cannot follow.

What am I missing?

After having read the document now I think I see the misunderstanding.

From my understanding the attack is:

1. Hold a row open for 70us

2. Ram controller may refresh a row

3. Go to 1

I saw nothing that mentioned a hard requirement of 64ms referenced in the post you replied to - they only mentioned that they kept it within 64ms to "Prevent data-retention failures from interfering with read-disturb failures"

the CPU has microcode... I wonder if that includes its DRAM controller.
Maybe they're using DDR3? Looking at a micron DDR3 datasheet there's no maximum for active-to-precharge

:edit: no, I should have read the paper - they tested with DDR4. Strange

I found the "ACTIVATE-to-PRECHARGE command period" in a Micron DDR3 datasheet, same value: 9 x tREFI

https://www.mouser.com/datasheet/2/671/4Gb_DDR3L-1283964.pdf

Page 78, speed bin tables.

Refresh basically is an activate/precharge sequence, so keeping a row open long is the same as denying refresh to that bank.

Won't ECC memory be a sufficient defense against that? I think it was invented specifically to overcome random bit flips.

If so, server / cloud infrastructure is largely unaffected.

> I think it was invented specifically to overcome random bit flips

The trouble is that in an attack, the bit flips aren't random and uncorrelated but they are purposefully being made in a small memory region

All it takes is two bit flips to defeat most ECC

«Importantly, the researchers haven't demonstrated that ECCploit works against ECC in DDR4 chips, a newer type of memory chip favored by higher-end cloud services. They also haven't shown that ECCploit can penetrate hypervisors or secondary Rowhammer defenses. Nonetheless, the bypass of ECC is a major milestone»

Wow anyway.

wait, did it or didn't it bypass ECC? First sentence doesnt line up with last sentence in the quote?
Any chance such access patterns could occur by accident?

Specific types of computations, processing datastructures with a specific layout, poorly written (but correct) code, ...?

"50 C" refers to 50°C as in average kinetic energy of particles. They did the measurements at elevated temperatures.
It isn't that elevated; RAM inside a laptop or server that is doing anything compute intensive will often be warmer than 50°C (122°F).
You are right. I was thinking elevated above room temperature, but that doesn't make much sense in this scenario.
I thought they might be trying to throw in some old Halt and Catch Fire trick