Hacker News new | ask | show | jobs
by ilusion 2 hours ago
I'm very curious to see a benchmark for this - have toyed with the idea myself but haven't put in the hard work to test these hypothesis on extracting learning signal from deep-agent traces.
1 comments

There's some benchmarks in the repo for AppWorld. Looks promising