Hacker News new | ask | show | jobs
by dragontamer 1106 days ago
> Basically it uses popcnt to know how many there are and uses tzcnt repeatedly to find the position of a single one.

BLSI and BLSR are better instructions for iterating over a bitset and finding the 1s (BLSI) and then removing the 1s (BLSR).

Furthermore, it only took 3 assembly instructions to implement before the BMI instruction set (see: https://www.chessprogramming.org/General_Setwise_Operations#... )

You're correct on popcnt, but this kind of iteration I normally see with BLSI / BLSR, not with TZCNT. I feel like TZCNT is slower and less elegant for this purpose.

---------

To go from a single binary bit selected (ex: 0b00001000) into its index (ie: 3 in this case) I guess TZCNT could be used though. But I would have done popcnt(x-1) instinctively.

TZCNT would be faster though, so there's that.

------

But all of these operations are only ~2 to 3 assembly instructions. Roughly on the same scale as absolute value. With exception of popcnt of course.

----

EDIT: Okay, I think I see how TZCNT comes into play now. Thanks! Just had to think it through.