|
|
|
|
|
by PoignardAzur
967 days ago
|
|
That kind of benchmark is a lot more reliable for models published before the benchmarks; models published afterwards have more opportunity to "study to the test". That's especially a concern when a company explicitly uses its score on that benchmark as a marketing point. |
|