|
|
|
|
|
by da-x
6 days ago
|
|
Maybe someone can devise a distributed bench-marking system where multiple people collaborate on tests and also vet each other's tests and rating without revealing them to the public. I have my own "interview questions" for models where I give them a premade Git repo and a problem to solve. Then, I rate them like a teacher. I believe other do that as well, so we only need a reliable system to aggregate these results. |
|
The only way to make it fair is to have the model provider give some benchmarking org the weights + inference engine, so that the model can be run in complete isolation and no information about the benchmark is leaked.
Though I guess for a 'random' person's benchmark that hides between all other requests it's probably ok.