Hacker News new | ask | show | jobs
by nxobject 512 days ago
Spoiler if you don’t want to read through the (wonder but many) paragraphs of exposition: the instruction is `vp2intersectq k, zmm, zmm`.
2 comments

And, as noted in the article, that's an instruction which only works on two desktop CPU architectures (Tiger Lake and Zen 5), including one where it's arguably slower than not using it (Tiger Lake).

Meaning... this entire effort was for something that's faster on only a single kind of CPU (Zen 5).

This article is honestly one of the best I've read in a long time. It's esoteric and the result is 99.5% pointless objectively, but in reality it's incredibly useful and a wonderful guide to low-level x86 optimization end to end. The sections on cache alignment and uiCA + analysis notes are a perfect illustration of "how it's done."

Presumably Zen 5 cores will also get used in Threadripper and EPYC processors.
Yep. And the feature will probably be available on all AMD CPUs manufactured from here on.

It might be an esoteric feature today. But if it'll become an ubiquitous feature in a few years, its nice to learn about using it.

Not just that, but the fact that Intel CPUs execute it 20-30 times slower than AMD Zen 5 CPUs.

Also, the fact that it's deprecated by Intel.

Now the question is if Intel will revive it now that Zen 5 has it.