| > Basically it uses popcnt to know how many there are and uses tzcnt repeatedly to find the position of a single one. BLSI and BLSR are better instructions for iterating over a bitset and finding the 1s (BLSI) and then removing the 1s (BLSR). Furthermore, it only took 3 assembly instructions to implement before the BMI instruction set (see: https://www.chessprogramming.org/General_Setwise_Operations#... ) You're correct on popcnt, but this kind of iteration I normally see with BLSI / BLSR, not with TZCNT. I feel like TZCNT is slower and less elegant for this purpose. --------- To go from a single binary bit selected (ex: 0b00001000) into its index (ie: 3 in this case) I guess TZCNT could be used though. But I would have done popcnt(x-1) instinctively. TZCNT would be faster though, so there's that. ------ But all of these operations are only ~2 to 3 assembly instructions. Roughly on the same scale as absolute value. With exception of popcnt of course. ---- EDIT: Okay, I think I see how TZCNT comes into play now. Thanks! Just had to think it through. |