Y
Hacker News
new
|
ask
|
show
|
jobs
by
gillesjacobs
818 days ago
The yes-or-no reference answer test is a really bad way to go about this. Maybe take a note out of RAGAS evaluation templates and use an LLM to iteratively summarise the nuanced category.