| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hyperpape 11 days ago
	We need a companion to "IN MICE", which is "IN EVALS". I don't think this is bad research, but you have to understand how far it generalizes. I'm not saying that evals are useless, we need to do our best to produce good benchmarks. But benchmarks are always going to lag pretty far behind real world applications.