| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by DetroitThrow 85 days ago
	Um, yes this is a extremely specific as a benchmark harness. It has a ton of knowledge encoded about the tasks at hand. The tweet is dishonest even in the best light. The hard part of these tests isn't purely reasoning ability ffs.