Why do execution times drop so drastically with increasing number of iterations? Shouldn’t the caches be filled after one iteration already? There is no JIT in C++, or is it?
I only had a quick look at the code, but it looks like it's timing memory allocation. For example the sprintf part uses std::string str(100, '\0'). I'm not a C++ expert, but I believe this is essentially doing a malloc and memset of 100 bytes for every call to sprintf. So this is probably a poorly setup benchmark.
Your CPU is effectively a virtual machine with stuff like branch prediction, speculative execution w/rollback, pipelining, implicit parallelism, etc. etc.
Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...
This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.
What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.
* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".
> C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
Can I talk to you about our Lord and Savior the CPU trace cache[1]?
That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.
In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?
Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.
> If you want to muddy the waters for contrarianism, [..].
No, and I don’t appreciate the accusation.
> This is clearly not what the OP was asking about.
Eh. I thought this was on topic when I wrote. On a second read I’m not sure either way. In any case, my point stands, I think: there are things happening that warm up after multiple loop iterations, as characteristic of JITs and not caches, and one potential source of those things is in fact JITish despite the fact that the translation of C++ into x86-64 has nothing to do with it—even if I’m not sure whether this is the right explanation in this particular case. The general answer to “can JITish things happen to my C++ code” is a definite yes.
Could be dynamic frequency scaling. To minimize the impact of it when benchmarking one can pin the process to a single core and warm it up before running the benchmark.