Y
Hacker News
new
|
ask
|
show
|
jobs
by
jfaganel99
84 days ago
Working on a model benchmark focused on which model is good for these tasks. Keep you posted
1 comments
fhouser
84 days ago
Thanks,that would be great.
link
jfaganel99
80 days ago
As promised here is the open-source GitRepo so you can give it a go with your tooling:
https://github.com/kolega-ai/Real-Vuln-Benchmark
Updated benchmark results published here also. BTW, with v002 we are consistently hitting 75+
link