Hacker News new | ask | show | jobs
by JegernOUTT 1020 days ago
It can be checked if the model predicts canonical solutions from humaneval. I understand it is not ideal, but at least you can check it yourself

There are a bunch of other benchmarks too, check out the page https://huggingface.co/smallcloudai/Refact-1_6B-fim

Also, feel free to run any new benchmarks