| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mentalgear 455 days ago
	I feel like that at this point, we need an AI agent to compare AI agent frameworks. And benchmarks. Well thought out, structured, non-cherry-picked benchmarks to highlight which framework does well in what area.

1 comments

Benchmarks are WIP. We're thinking about durability, task latency, agent throughput. What else would you like to see?

Pass^k and not Pass@k (see https://www.philschmid.de/agents-pass-at-k-pass-power-k). Would be a great twofer to see the code used to run the benchmarks as examples.

Will take a look, thanks!