Y
Hacker News
new
|
ask
|
show
|
jobs
by
ilusion
2 hours ago
I'm very curious to see a benchmark for this - have toyed with the idea myself but haven't put in the hard work to test these hypothesis on extracting learning signal from deep-agent traces.
1 comments
funfunfunction
2 hours ago
There's some benchmarks in the repo for AppWorld. Looks promising
link