Hacker News new | ask | show | jobs
by ajross 3952 days ago
If you're getting a 4x difference in IPC using a crypto microbenchmark from compiled C code (i.e. it doesn't sound like you're bandwidth or I/O limited), there has to be something else at work. POWER8 is a nice core, but it's not that wide. Maybe the compiler was recognizing your operations and replacing them with AES primitives?
1 comments

Caches and memory latency/bandwidth can have serious effects as well.
Yes, but at this kind of multiplier only in the case where the entire test is 100% cache-resident on one CPU and spilling on the other. Crypto stuff tends to have small working sets, so my intuition is that it's got to be something else.