|
|
|
|
|
by BeeOnRope
2752 days ago
|
|
Thanks, that is really interesting. It is hard to believe that pdep/ext alone could result in a 10x throughput improvement - but I acknowledge it is possible since that is one very slow to emulate instruction in the general case, and if you needed exactly that... It actually isn't clear to me exactly what Intel was targeting with that pair of instructions, but they sure is useful in all sorts of scenarios. > The only other non-standard instructions with similar value are the AES intrinsics If I can ask, what are the interesting uses outside of encryption? The main use I am aware of is as a handy fast and high-quality hash function implemented in hardware (and you don't need all the rounds when you are just after quality, and not adversarial collision resistance). |
|
I've read some things from Intel that suggest PDEP/PEXT were designed for cryptographic applications. However, they are a straightforward implementation of generalized shift networks (there is literature on this), so their potential applications are much broader.
For AES, those instructions have interesting properties for integer manipulation beyond encryption, and even beyond providing the basis for the fastest generic non-cryptographic hash functions currently available for both large and small keys. For example, you can compute a perfect hash (e.g. collision-free hashing from 32-bits to 32-bits) in a few clock cycles for scalar primitives using AES intrinsics. If you understand the construction, which superficially seems like it should not be possible, the result is virtually ideal statistically. Brilliant for hash tables, which still spend a lot of their time hashing, so I am surprised no one seems to be doing it (I figured it out myself, studying the statistical peculiarities of the AES instructions).