Y
Hacker News
new
|
ask
|
show
|
jobs
by
yaronsc
444 days ago
Benchmarks are WIP. We're thinking about durability, task latency, agent throughput. What else would you like to see?
1 comments
namnnumbr
443 days ago
Pass^k and not Pass@k (see
https://www.philschmid.de/agents-pass-at-k-pass-power-k
). Would be a great twofer to see the code used to run the benchmarks as examples.
link
yaronsc
443 days ago
Will take a look, thanks!
link