Hacker News new | ask | show | jobs
by codedokode 44 days ago
> I was a bit confused by the if-else chains you have e.g. on line 111 where it looks like you could just use the else branch for every case?

The point of "ifs" there is to replace a variable "size" with a constant like 8, 16 so that the compiler can hardcode it in the code and maybe make it more efficient (for example, instead of counting the iterations the compiler can just repeat the instruction 8 times when iteration count is 8). I expect that the compiler would generate customized "linear_search" function versions instead of using generic version with unknown number of iterations. You can see the example in godbolt assembly code at line 825, where the compiler unrolled a loop for "linear_search" function with size = 8 - there is no loop, just 8 comparisons.

> As for the branch prediction, I would think most of the branches would be predicted correctly because usually you'll have runs of "not x" before each "is x". Switching between many loops of course hurts the prediction.

I don't think so. In binary search there is no predictable pattern, so the predictor would miss in 50% of cases. And in linear search there is a guaranteed miss in the last iteration.

> You could circumvent the counter issue by pinning to a specific core,

I think I'll try this in future.

> but I find I have to write some code yourself

I can write it myself, but it would take 20-30 minutes including time to find and read the docs for functions while for LLM it takes less than a minute. But anyway I have to edit the result a lot, for example LLM "forgot" that you need to use the result of the search to prevent compiler from throwing the code away. Also, free version of ChatGPT became much dumber couple of months ago.

1 comments

Fair point that binary search causes a lot of branch mispredictions. Did you ever measure whether the unrolled linear search with 8 elements outperforms a "rolled" one? Because with instruction level parallelism I wonder how much difference removing 8 mostly easily predicted (in the linear search) branches makes.
I didn't try, but loop contains a branch so there could be misprediction, which doesn't happen with unrolled loop. Also, instruction level parallelism applies to unrolled code as well.