Y
Hacker News
new
|
ask
|
show
|
jobs
by
goldenarm
23 days ago
It's a gibberish input detection benchmark, and does not measure output hallucinations.