Hacker News new | ask | show | jobs
by epolanski 321 days ago
How does running it multiple times performs?

LLMs are non-deterministic, I think benchmarks should be more about averages of N runs, rather than single shot experiments.