Hacker News new | ask | show | jobs
by hgaddipa001 285 days ago
We did a lot of internal testing but no official benchmark.

We find that the less the agent knows, the more it hallucinates