|
|
|
|
|
by 414owen
1073 days ago
|
|
Very interesting approach. I should probably have specified that the somewhat naive assembly in `02-the-same-speed-as-c/loop-5.x64.s` is the fastest version I have. On my machine I'm getting 0.244s for `loop-5.x64.s` and 0.422s for your implementation above. I'm not sure why exactly we're seeing this discrepancy, and for what it's worth your implementation looks faster to me. I guess this is why you need to always benchmark on the hardware you're going to be running the code on... |
|
I would have expected yours to be faster given that it needs to execute fewer instructions per loop iteration. Though maybe the CPU can run `adc` on more ports compared to a load from memory?