| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zen4ttitude 138 days ago
	Does anyone know more about the benchmark? 60% accuracy gets a drumroll? How would Claude do? How would a human do? I tried the previous version and was not impressed. I went back to Claude that is very hard to beat, and versatile with context enrichment.