| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amrb 1191 days ago
	I'd like to see a yearly benchmark for models, could be logic puzzles or a suit of tasks but as it stands there is not good way to measure the ability of models.