|
|
|
|
|
by 7e
629 days ago
|
|
With Artificial Analysis I wonder if model tweaks are detectable. That’s the benefit of a standardized benchmark, you’re testing the hardware. If some inference vendor changes Llama under the hood, the changes are known. And of course if you don’t include precise repro. instructions in your standardized benchmark, nobody can tell how much money you’re losing (that is, how many chops are serving your requests). |
|