| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by djfergus 49 days ago
	We need a benchmark that tests a models ability to do LLM research.