Y
Hacker News
new
|
ask
|
show
|
jobs
by
moyix
933 days ago
At least they used HumanEval+, which adds a bunch more test cases and fixes some errors in the original benchmark!