|
|
|
|
|
by dzaima
134 days ago
|
|
RVV still very much requires you to write a manual code/assembly loop doing the "compute how many elements can be handled, decrease count by that, repeat if necessary" thing. All it does is make it slightly less instructions to do so (and also allows handling a loops tail in the same loop while at it). |
|
IIRC libc for x64 has several implementations of memcpy/memmov/strlen/etc. for different SSE/AVX extensions, which all get compiled in and shipped to your system; when libc is loaded for the first time, it figures out what is the latest extension the CPU it's running on actually supports and then patches its exports to point to the fastest working implementations.