|
|
|
|
|
by flumes_whims_
30 days ago
|
|
Benchmarking for giving I don't know rather than wrong answer seems to be the right way to steer industry towards making models that are good at this. AA-Omniscience is one such benchmark. AA-Omniscience is a knowledge and hallucination benchmark that rewards accuracy, punishes bad guesses and provides a comprehensive view of which models produce factually reliable outputs across different domains.
The benchmark contains 6,000 questions across 6 major domains, derived from authoritative academic and industry sources and generated automatically using an LLM-based question generation agent to ensure unambiguity, scalability and factual precision https://artificialanalysis.ai/evaluations/omniscience |
|