|
|
|
|
|
by stabbles
1979 days ago
|
|
Oh come one now, don't be silly. Think about vectorization and loop unrolling. It _always_ does a memory load with an offset as a single CPU instruction. As an example a 4 times unrolled sum of doubles on AVX looks like this: L64:
vaddpd ymm0, ymm0, ymmword ptr [rcx + 8*rsi]
vaddpd ymm1, ymm1, ymmword ptr [rcx + 8*rsi + 32]
vaddpd ymm2, ymm2, ymmword ptr [rcx + 8*rsi + 64]
vaddpd ymm3, ymm3, ymmword ptr [rcx + 8*rsi + 96]
add rsi, 16
cmp rdx, rsi
jne L64
The `rcx + 8*rsi + 32` stuff is offsets the compiler generates. Don't even think about worrying about -1 here... |
|
Don't be snarky.
I mentioned it's theoretically possible and outlined the specific conditions for optimizations to happen.
But I also mentioned that in reality various things often prevent compilers (assuming it's a compiled language, and not interpreted, like that LUA implementation) from applying such optimizations:
- you're programming against an API or are compiling an API (i.e. non-static functions).
- you have time-constraints for emitting optimized code, like most JITs - you can't afford any deep analysis that would enable such optimizations, you're mostly pattern-matching for lower hanging fruits.
Since pictures say more than a thousand words, here's the actual disassembly of that lua function I linked earlier, as shipped by my Linux distribution: https://i.imgur.com/DnqVC8E.png
The green stuff is the "fast path". For 0-based indexing you would completely drop the first LEA and replace those last MOV/SHL/LEA with a (faster?) MOV r,r/SHL/ADD r,m - which is what GCC is likely to generate.
So that's Lua (one implementation at least) in practice. No compiler magic making arrays fast here.
LuaJIT (from my current reading) goes the different route of just pretending the first array element doesn't exist, meaning that in memory they have an element '0', but they simply don't use it. This trades some memory for speed.
From lj_tab.c:
Personally I quite like this approach. A lot better than hoping compilers will magically fix everything and quite hard to argue with, considering LuaJIT's performance. Though it would be a ludicrous design for a lower-level language.