Except it is not a loop[1], but a non-tail recursive function (with dual calls), if the compiler can optimise that to a loop that is actually impressive.
Most likely function call overhead (and in the slower cases, interpreter overhead) is probably what is being measured in the best case.
[1] In that it won't be a jump and check style loop.
Quite likely: https://godbolt.org/g/xGkCyA although it really depends on the compiler (gcc is smarter here). He also seems to be benchmarking printing more than anything else (hence the difference between C and C++).
Without versions of the compilers and explained methods of measurement, the results posted on the blog are more anecdotal knowledge than any kind of benchmark.
Most likely function call overhead (and in the slower cases, interpreter overhead) is probably what is being measured in the best case.
[1] In that it won't be a jump and check style loop.