|
|
|
|
|
by jstimpfle
21 days ago
|
|
I asked where is the part about unaligned pointers in your string processing example. Saying that you want to load multiple bytes at a time does not imply at all that you have to do unaligned loads. Doing unaligned loads using SSE or AVX might have been possible on Intel architectures for a long time, but it is still a little bit slower afaik. But anyway when you get into sub-architecture specific details like that, you've essentially left C-land, and you're essentially doing assembler level programming. |
|
Using unaligned loads, you can get rid of the prelude, including the associated branches and intptr arithmetic, and just have to deal with the tail.
If you're comparing short-ish strings, almost all of the time is spent in the prelude and postlude, even if the entire substring fits in a register. This is a silly language limitation when the hardware can actually easily just support the unaligned load.
In particular, it doesn't seem justified that what at most amounts to a tiny inefficiency in hardware turns into a very expensive class of bugs (UB).