| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mips_r4300i 971 days ago

I too am not seeing the gotcha here. The paper seems to be:

1. We ran a bunch of DDR4 outside spec directly with an FPGA, and the ram failed to nobody's surprise, and we characterized it.

2. We found a way to bamboozle the CPU's ram controller to achieve a similar effect.

I've written SDR, DDR1/2 ram controllers in verilog, I'm very familiar with autorefresh timing. You need to refresh each row every 64ms, and if you have, say , 1024 rows, then you must issue an autorefresh at least every 62uS. In newer rams there are some allowances to optimize for PVT, but this is a fundamental requirement of DRAM spanning almost 40 years.

In ye olden days of EDO and FPM drams, before synchronous, you had to manually select each row to refresh. Nowadays you just send autorefresh command with no argument. The chip itself maintains a row counter and auto increments it for round robin refresh.

I see 2 potential snags. The first is, JEDEC says that you're sometimes allowed to defer refresh 9 periods. But you do have to refresh more later to make up for it.

The second is if Intel cut corners in their controller. The controller should enforce a hard cutoff after the row stays open too long. The paper mentions this as a potential mitigation (but isn't this simply a hard design rule anyway?) The paper mentions such mitigations would not work because "the row would've been open for too long before refresh anyway". This bafflingly circular logic I cannot follow.

What am I missing?