While we are at stupid microbenchmarks. The following loop in C (line being a struct, length is int):
for (int i = 1; i <= 400000000; ++i) {
++line.length;
}
Compiles to the following ASM:
@8:
inc eax
dec edx
jne @8
..as expected. The compiler realized that there is no need to actually copy the intermediate values to line.length and does everything in registers.
Now here is the same loop in Lua (everything dynamically typed, line.length must be hashed (in theory) just like in Python):
for i = 1, 400000000 do
line.length = line.length + 1
end
LuaJIT 2.1 generates the following ASM:
->LOOP:
addsd xmm7, xmm0
movsd [rax], xmm7
add edi, +0x01
cmp edi, 0x17d78400
jle 0x7fee13effe0 ->LOOP
The C program executes in ~0.3s, the Lua one in ~0.5s .. and those 0.5s include LuaJIT's startup, profiling, and JIT compilation time. So for a longer running program the difference would be even smaller.
Tl;dr: modern JIT compilers are amazing and can optimize away the things mentioned in the article.
In fact without the printf all you get is "reqz ret" (not counting .init overhead in the binary). That is because the compiler detects that line.length is not used and fails to even set it.
#include <stdio.h>
int main(int argc,char *argv[])
{
struct lines
{
char *somedata;
int length;
} line;
int i;
line.length=0;
for (i=0;i<=400000000;++i)
{
++line.length;
}
printf("Line len %d\n",line.length);
}
I did not. I used Pelles C to compile it (with the optimizer turned on). I am not surprised that GCC managed to eliminate the pointless loop entirely in this situation. In fact I was happy that both Pelles C and LuaJIT did not realize that the whole loop was pointless and thus I did not have to come up with a more complex example.
The primary point of this was to show that JIT compilers can optimize away hash table access and dynamic typing in hot loops, not a code generation competition between LuaJIT and GCC.
I am a pretty big LuaJIT fanboy and not even I would claim that Lua compiled with LuaJIT can compete against C compiled by current versions of GCC with all optimizations turned on, at least not in most real-world cases. However, it does get amazingly close, most of the time within an order of magnitude, which means that I personally do not need to use C anymore.
Now here is the same loop in Lua (everything dynamically typed, line.length must be hashed (in theory) just like in Python):
LuaJIT 2.1 generates the following ASM: The C program executes in ~0.3s, the Lua one in ~0.5s .. and those 0.5s include LuaJIT's startup, profiling, and JIT compilation time. So for a longer running program the difference would be even smaller.Tl;dr: modern JIT compilers are amazing and can optimize away the things mentioned in the article.