Hacker News new | ask | show | jobs
by anonym29 26 days ago
Per AA's Omniscience Index benchmark, the "non-hallucination rate" subcomponent (1 - hallucination rate) of 4% for DS4F vs 66% for M2.7.

https://artificialanalysis.ai/leaderboards/models?weights=op...

1 comments

In the same page DS4F scores much better on Omniscent Accuracy. I would take those numbers with a bit of salt. For instance I ran different benchmarks against Qwen 3.6 27B and DS4F quantized at 2bit. DS4F hallucination rate is much lower. In general I find artificialanalysis benchmarks not very aligned with what I see in the field, but in this specific case I did many tests and it is even more so.