Hacker News new | ask | show | jobs
by holoduke 1022 days ago
Whats the difference between 1% and 99% of HumanEval? What does it tell really?
1 comments

for pass@1 HumanEval tells how well the model solves a task from a set, given only one chance to solve it. It's not the perfect metric, there're other like DS-1000, MBPP (we have included them on HuggingFace model card). HumanEval is good for benchmarking with other models as it gives a fast idea how powerful the model is.
> given only one chance to solve it

my understanding is that there are 2 usages of the pass@{number} syntax. the HumanEval/Codex paper interprets the {number} as number of attempts[0]. however language modelers seem to use it to denote the number of few shot example demonstrations given in the context. these are starkly different and i wish the syntax wasnt overloaded

---

[0] https://arxiv.org/pdf/2107.03374.pdf

> Kulal et al. (2019) evaluate functional correctness using the pass@k metric, where k code samples are generated per problem, a problem is considered solved if any sample passes the unit tests, and the total fraction of problems solved is reported.