https://github.com/lattera/glibc/blob/master/string/strlen.c
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86...
The pcmpeqb instruction is from SSE 4, it compares 16 bytes per op
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86...
The pcmpeqb instruction is from SSE 4, it compares 16 bytes per op