|
|
|
|
|
by estebarb
637 days ago
|
|
It is a completely different kind of parallelism. Seastar makes easier to leverage parallelism across many cores. Here Valkey is leveraging instruction level parallelism within a single core. A single CPU core actually can execute more than one instruction at a time, by leveraging out of order execution. The trick for leveraging out of order execution is avoiding having data dependencies locally. By swapping the iteration order, they allow the CPU core to continue with the next iteration before the previous has finished. Why? Because there is no data dependency anymore! I haven't profiled that code, but I guess that now the bottleneck would be the sum. But it doesn't matter, as accesing a register is the fastest operation. Accesing the memory cache is slower, and accessing RAM is even slower. |
|