Y
Hacker News
new
|
ask
|
show
|
jobs
by
selim-now
261 days ago
That would definitely make the evaluation more robust. My fear is that with LLMs at hand people became allergic to preparing good human-labelled evaluation sets and would always to some degree use an LLM as a crutch.