Hacker News new | ask | show | jobs
by jfaganel99 84 days ago
Working on a model benchmark focused on which model is good for these tasks. Keep you posted
1 comments

Thanks,that would be great.
As promised here is the open-source GitRepo so you can give it a go with your tooling: https://github.com/kolega-ai/Real-Vuln-Benchmark

Updated benchmark results published here also. BTW, with v002 we are consistently hitting 75+