The problem with benchmarks is that they don't always represent real world performance.
Where I work sometimes we have really big PRs (+-5000 lines of code). In GitHub diff pages, the scrolling in Chrome is really jerky, while the scrolling in Firefox Quantum is smooth. And for me this is much more important than some random Javascript benchmark.
One thing that these benchmarks don't capture is user perceived latency. A lot of work went into this for 57, making it feel faster, regardless of what a microbenchmark may pick up.
Of course, work is going into making those numbers better too, but I don't think that this kind of benchmarking tells the whole story.
(Also, these mostly seem to be benching JavaScript performance? That's certainly not the whole story of 57.)
Where I work sometimes we have really big PRs (+-5000 lines of code). In GitHub diff pages, the scrolling in Chrome is really jerky, while the scrolling in Firefox Quantum is smooth. And for me this is much more important than some random Javascript benchmark.