|
|
|
|
|
by nabakin
75 days ago
|
|
Public benchmarks can be trivially faked. Lmarena is a bit harder to fake and is human-evaluated. I agree it's misleading for them to hyper-focus on one metric, but public benchmarks are far from the only thing that matters. I place more weight on Lmarena scores and private benchmarks. |
|