| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Breza 349 days ago
	I agree. Public benchmarks aren't very useful for a bunch of reasons. Any company relying on LLMs for a critical function should have its own internal benchmark system. I maintain such a system for my job. If you are able, use the same prompt every time. It's fun to be able to include models like the original Bard on our leader board.