|
|
|
|
|
by chmod775
1979 days ago
|
|
>I am not sure (1) is a very convincing argument. The subtraction of 1 internally is not necessarily gonna be there once the code gets compiled. Theoretically one can optimize it - though only when one can statically infer the index and the compiler decides to inline the array access function. If you can't do that, the best you can hope for is a fast CPU instruction like LEA or something. In theory one could make a lot of things fast, in practice they rarely are, and it's always better to avoid problems now than to pray later. |
|
Think about vectorization and loop unrolling. It _always_ does a memory load with an offset as a single CPU instruction.
As an example a 4 times unrolled sum of doubles on AVX looks like this:
The `rcx + 8*rsi + 32` stuff is offsets the compiler generates. Don't even think about worrying about -1 here...