Hacker News new | ask | show | jobs
by andai 181 days ago
This model has the best score on that benchmark.

Edit: Huh... It does score highest in "Omniscience", but also very high in Hallucination Rate (where higher score is worse)...

1 comments

this has one of the worse score in AA-Omniscience Hallucination Rate