Hacker News new | ask | show | jobs
by oshrimpton 6 days ago
Yeah the benchmark for sure isn't perfect and without super rigid prompting it is far too easy for it to get off course. 28% hallucination rate isn't nothing either