Y
Hacker News
new
|
ask
|
show
|
jobs
by
meru_2025
493 days ago
A dynamic human-in-the-loop evaluation benchmark is great for preventing data contamination and test saturation. Worth my time to read.