Hacker News new | ask | show | jobs
by orlp 784 days ago
> It just makes it O(n) where n := 64

That's O(1).

> The LZCNT operation itself is a loop over all bits

That's not how circuits work. You can easily make a circuit of O(log n) depth that returns base-2 encoded index of the first 1 bit in a n-bit register.

But since n is just 64 here, you're looking at a circuit of AND/OR depth ~8, so it's no surprise that modern CPUs can compute the LZCNT / TZCNT of a 64-bit integer in a single CPU cycle.

2 comments

Note that rustc will not emit this instruction by default, to support older CPUs. Without `target-cpu=haswell` or similar, the operation will be quite slow, but still O(1) of course.
It's not that bad, all x86 processors have the BSR instruction which can be mapped to LZCNT semantics with just a few extra instructions.

Left to right - modern x86-64, baseline x86-64, baseline i686:

https://rust.godbolt.org/z/nGsM35TEo

Maybe you're thinking of POPCNT, which hurts a lot more if it doesn't compile down to the native instruction:

https://rust.godbolt.org/z/xcxG3v4Mn

> That's O(1).

A loop over 2^64 elements is also O(1), but I don't think people are ok to label every program that can run on a 64 bit pc as O(1).

You should really refresh yourself on Big-O notation.