|
|
|
|
|
by grog454
176 days ago
|
|
I guess there's two things I'm still stuck on: 1. What is the purpose of the benchmark? 2. What is the purpose of publicly discussing a benchmark's results but keeping the methodology secret? To me it's in the same spirit as claiming to have defeated alpha zero but refusing to share the game. |
|
2. I discussed that up-thread, but https://github.com/microsoft/private-benchmarking and https://arxiv.org/abs/2403.00393 discuss some further motivation for this if you are interested.
> To me it's in the same spirit as claiming to have defeated alpha zero but refusing to share the game.
This is an odd way of looking at it. There is no "winning" at benchmarks, it's simply that it is a better and more repeatable evaluation than the old "vibe test" that people did in 2024.