Hacker News new | ask | show | jobs
by mityamitya 1016 days ago
Hi! We ran LSH filtering over datasets to remove all code that can be similar to HumanEval samples.
1 comments

so, we have to trust your procedure..
It can be checked if the model predicts canonical solutions from humaneval. I understand it is not ideal, but at least you can check it yourself

There are a bunch of other benchmarks too, check out the page https://huggingface.co/smallcloudai/Refact-1_6B-fim

Also, feel free to run any new benchmarks