| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amelius 466 days ago
	Aren't there any "blind" benchmarks?

2 comments

nathanasmith 466 days ago

Unfortunately that wouldn't help as much as you think since talented AI labs can just watch the public leaderboard and note what models move up and down to deduce and target whatever the hidden benchmark is testing.

link

nickthegreek 466 days ago

OpenRouter Arena Ratings are probably the closet thing.

link