| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jandrewrogers 2749 days ago

For PDEP/PEXT it is the general case of ad hoc and unpredictable bit extract/deposit sequences. A decade ago, I spent a lot of time designing clever libraries that could dynamically effect this but even if you could amortize the overhead of setting up the machinery, it still was ~20 cycles. These instructions eliminated the need to code gen at all, and each instruction runs a lot faster than ~20 cycles. When those instructions showed up with Haswell, it wiped out a lot of code I had written, and in a good way. You can compose them to effect algorithms that would be very complicated (and slow) to implement otherwise.

I've read some things from Intel that suggest PDEP/PEXT were designed for cryptographic applications. However, they are a straightforward implementation of generalized shift networks (there is literature on this), so their potential applications are much broader.

For AES, those instructions have interesting properties for integer manipulation beyond encryption, and even beyond providing the basis for the fastest generic non-cryptographic hash functions currently available for both large and small keys. For example, you can compute a perfect hash (e.g. collision-free hashing from 32-bits to 32-bits) in a few clock cycles for scalar primitives using AES intrinsics. If you understand the construction, which superficially seems like it should not be possible, the result is virtually ideal statistically. Brilliant for hash tables, which still spend a lot of their time hashing, so I am surprised no one seems to be doing it (I figured it out myself, studying the statistical peculiarities of the AES instructions).