It's a lot of work to fix this benchmark, which is fairly contrived to begin with. To start, have to adjust every sample to accept a warmup time, then run time (which is likely multiple samples), measuring results in that run time, both speed and performance. And you also have to be careful that the compiler is then not optimizing out the repetitions in the runtime, while still allowing optimizations that would produce the best performance.
If I had a nickel for every time I've seen a language benchmark be a very specialized contrived problem (in this case specific JSON with specific access pattern) I'd have a lot of nickels.