Hacker News new | ask | show | jobs
by gyudin 393 days ago
Super weird benchmarks
1 comments

from what I gather it's finetuned to use OpenHand specifically so shows value on thsoe benchmark that target a whole system as a blackbox (i.e. agent + llm) more than directly target the llm input/outputs