| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nicebyte 485 days ago
	How did you draw that conclusion from reading the contents of the link? This is a benchmark. > We evaluate model performance and find that frontier models are still unable to solve the majority of tasks.