Hacker News new | ask | show | jobs
by riku_iki 1023 days ago
so, we have to trust your procedure..
1 comments

It can be checked if the model predicts canonical solutions from humaneval. I understand it is not ideal, but at least you can check it yourself

There are a bunch of other benchmarks too, check out the page https://huggingface.co/smallcloudai/Refact-1_6B-fim

Also, feel free to run any new benchmarks