| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Banditoz 57 days ago
	If the benchmarks are private, how do we reproduce the results? I looked up the Humanity's Last Exam (https://agi.safe.ai/) this model uses and I can't seem to access it.

1 comments

The test data is purposely difficult to access to reduce the chance of leaking it into the training dataset.