| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wongarsu 4 hours ago
	Tbf, most of the "real benchmarks" have issues that are just as bad. Assessing LLM performance is just hard