Hacker News new | ask | show | jobs
by selim-now 261 days ago
That would definitely make the evaluation more robust. My fear is that with LLMs at hand people became allergic to preparing good human-labelled evaluation sets and would always to some degree use an LLM as a crutch.