|
|
|
|
|
by bri3d
504 days ago
|
|
And, as noted in the article, that's an instruction which only works on two desktop CPU architectures (Tiger Lake and Zen 5), including one where it's arguably slower than not using it (Tiger Lake). Meaning... this entire effort was for something that's faster on only a single kind of CPU (Zen 5). This article is honestly one of the best I've read in a long time. It's esoteric and the result is 99.5% pointless objectively, but in reality it's incredibly useful and a wonderful guide to low-level x86 optimization end to end. The sections on cache alignment and uiCA + analysis notes are a perfect illustration of "how it's done." |
|