| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by selim-now 261 days ago
	That would definitely make the evaluation more robust. My fear is that with LLMs at hand people became allergic to preparing good human-labelled evaluation sets and would always to some degree use an LLM as a crutch.