|
|
|
|
|
by diamondlovesyou
932 days ago
|
|
AMD's string store is not like Intel's. Generally, you don't want to use it until you are past the CPU's L2 size (L3 is a victim cache), making ~2k WAY too small. Once past that point, it's profitable to use string store, and should run at "DRAM speed". But it has a high startup cost, hence 256bit vector loads/stores should be used until that threshold is met. |
|
> With the new Zen3 CPUs, Fast Short REP MOV (FSRM) is finally added to AMD’s CPU functions analog to Intel’s X86_FEATURE_FSRM. Intel had already introduced this in 2017 with the Ice Lake Client microarchitecture. But now AMD is obviously using this feature to increase the performance of REP MOVSB for short and very short operations. This improvement applies to Intel for string lengths between 1 and 128 bytes and one can assume that AMD’s implementation will look the same for compatibility reasons.
https://www.igorslab.de/en/cracks-on-the-core-3-yet-the-5-gh...