|
|
|
|
|
by crest
1655 days ago
|
|
Now compare how many different versions of the functions are required for the dozens of possible x86 extensions (and combinations of them) and all the prologue/epilogue code required to watch out for page boundaries and unaligned pointers and as well as the length of the inner loop to handle all the packing/unpacking and cobbeling together horizontal operations to the required masks and turn somehow use them for flow control where needed. It's enough code to put painful pressure on the instruction cache and requires wide OoO superscalar CPU cores to be worth the overhead compare the code in the RISC V vector spec with this strcmp https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/m... and tell me it's a clean and straightforward implementation using the instruction set as intended and not an ugly hack around its limitations. |
|