Hacker News new | ask | show | jobs
by jhallenworld 970 days ago
So this is a direct DRAM spec violation: there is a spec in the DRAM datasheet known as tRAS (row address strobe: time from row open (read) to row close (write back)). Min is 33 ns, Max is 9*tREFI. tREFI (average refresh period) depends on temperature: for below 85C, it's 7.8 us. So tRAS max is 70 us. (this is from some random Micron DDR4 datasheet)

Um, so of course they can trigger problems when they violate the spec!

Were they able to find a DRAM controller that violates the spec? If so, that's a simple bug in the DRAM controller. Well I guess so, the paper mentions Intel i5-10400 (Comet Lake). Do AMD processors have this issue?

5 comments

Most vulnerabilities are spec violations. If there is a real system with this bug, then that system is vulnerable.

The distinction with RowHammer is not if it is a vulnerability, but what component can be blamed for the vulnerability.

I too am not seeing the gotcha here. The paper seems to be:

1. We ran a bunch of DDR4 outside spec directly with an FPGA, and the ram failed to nobody's surprise, and we characterized it.

2. We found a way to bamboozle the CPU's ram controller to achieve a similar effect.

I've written SDR, DDR1/2 ram controllers in verilog, I'm very familiar with autorefresh timing. You need to refresh each row every 64ms, and if you have, say , 1024 rows, then you must issue an autorefresh at least every 62uS. In newer rams there are some allowances to optimize for PVT, but this is a fundamental requirement of DRAM spanning almost 40 years.

In ye olden days of EDO and FPM drams, before synchronous, you had to manually select each row to refresh. Nowadays you just send autorefresh command with no argument. The chip itself maintains a row counter and auto increments it for round robin refresh.

I see 2 potential snags. The first is, JEDEC says that you're sometimes allowed to defer refresh 9 periods. But you do have to refresh more later to make up for it.

The second is if Intel cut corners in their controller. The controller should enforce a hard cutoff after the row stays open too long. The paper mentions this as a potential mitigation (but isn't this simply a hard design rule anyway?) The paper mentions such mitigations would not work because "the row would've been open for too long before refresh anyway". This bafflingly circular logic I cannot follow.

What am I missing?

After having read the document now I think I see the misunderstanding.

From my understanding the attack is:

1. Hold a row open for 70us

2. Ram controller may refresh a row

3. Go to 1

I saw nothing that mentioned a hard requirement of 64ms referenced in the post you replied to - they only mentioned that they kept it within 64ms to "Prevent data-retention failures from interfering with read-disturb failures"

the CPU has microcode... I wonder if that includes its DRAM controller.
Maybe they're using DDR3? Looking at a micron DDR3 datasheet there's no maximum for active-to-precharge

:edit: no, I should have read the paper - they tested with DDR4. Strange

I found the "ACTIVATE-to-PRECHARGE command period" in a Micron DDR3 datasheet, same value: 9 x tREFI

https://www.mouser.com/datasheet/2/671/4Gb_DDR3L-1283964.pdf

Page 78, speed bin tables.

Refresh basically is an activate/precharge sequence, so keeping a row open long is the same as denying refresh to that bank.