Hacker News new | ask | show | jobs
by throawayonthe 32 days ago
well there is https://artificialanalysis.ai/evaluations/omniscience
1 comments

It's a gibberish input detection benchmark, and does not measure output hallucinations.