Hacker News new | ask | show | jobs
by crest 816 days ago
Speculation is required to get even close to the single-thread throughput expected of any modern CPU for anything worth running on a general purpose CPU. The problem is that there is no formal specified model to reason over side-channels not even just for timing side-channels. Most ISAs doesn't specify the time it takes execute instructions.

Lets assume I multiply two 64bit numbers. The CPU could just do it the same way every time and the worst-case has 4 cycles latency. It may also track if one of the factors is zero and dynamically replace the multiplication with a zeroing idiom that "executes" in 0 cycles when the scheduler learns that that either input is zero as an extreme example.

Less radical it could track if the upper halves of registers are zero to fast-path smaller multiplications (e.g. 32bit x 32bit -> 64bit) and shave off a cycle. IIRC some PowerPC chips did that, but for the second argument only. The ISA allowed it.

A realistic example are CPUs with data-dependent latency shift/rotate instructions. What do you do if an ISA doesn't specify if shift/rotate is constant time, but every implementation of it so far did it in constant time? Do you slowly emulate it out of paranoia that a future implementation may have variable latency? An other real-world example of this would be FPUs that have higher latency for denormalised numbers its just not relevant to (most) cryptographic algorithms.

How the fuck are you supposed to build anything secure, useful, and fast enough from that?