Hacker News new | ask | show | jobs
by meru_2025 493 days ago
A dynamic human-in-the-loop evaluation benchmark is great for preventing data contamination and test saturation. Worth my time to read.