Hacker News new | ask | show | jobs
by goldenarm 23 days ago
It's a gibberish input detection benchmark, and does not measure output hallucinations.