Hacker News new | ask | show | jobs
by sweis 4236 days ago
Long story short, memory bandwidth is much faster than the best x86 crypto implementations can handle.

Encrypting disks or network is no problem today, but we'll need architectural changes to support full memory encryption without a performance hit.

4 comments

This could be done mostly transparently, with the encryption in the memory controller. Addresses and data are already scrambled with a (non-cryptographic) scrambling code for EMI reasons. Of course, a sufficiently fast hardware crypto core would be required.

EDIT: Also, I forgot that the last generation of consoles (and I assume the current) have transparent encryption of main memory.

Indeed, there is discussion of encrypted memory on the controller in Risc-V at http://lowrisc.org
How do you square that with the performance of the AES-NI instructions? That is theoretically 16 bytes per cycle from the manual. Per core. That is way in excess of memory bandwidth, even with DDR4.
The theoretical maximum for current chips is less than 16 bytes per cycle. On Haswell you can process (in parallel) 7 blocks in roughly the time it would take to process 1. The latency of each round is 7 cycles, a full AES-128 10 rounds is ~70 cycles, so effectively you can process at most 1.6 bytes per cycle, or 1.14 if you use 256-bit keys (ignoring the cost of key scheduling and overhead here).

Even if you dedicate all CPU cores to the task of encrypting memory, you still stop short of exceeding theoretical memory bandwidth by quite a bit.

Do you believe it's reasonable to assume that AES performance will remain constant over the same 5-7 year timeframe? That's at least a couple of hardware generations for an improvement they could make in the current generation if there was a market for it.
There is certainly room for improvement, but I don't see an 16x speedup happening on a 5-year horizon using the current AES-NI instruction set.
Ah, I was forgetting about rounds, you're correct that you won't be able to match the memory bandwidth then.
The VIA C7 AES implementation could keep up with memory (ca. 20Gb/s). With suitable cipher modes you can use multiple pipelined units in parallel with negligible overhead.
Or fast, strong, pipelined hardware encryption.

AES is not the best you could do there.