| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 7e 629 days ago
	With Artificial Analysis I wonder if model tweaks are detectable. That’s the benefit of a standardized benchmark, you’re testing the hardware. If some inference vendor changes Llama under the hood, the changes are known. And of course if you don’t include precise repro. instructions in your standardized benchmark, nobody can tell how much money you’re losing (that is, how many chops are serving your requests).