Hacker News new | ask | show | jobs
by sjmaplesec 104 days ago
There's so much more we can do around activation and skills creation. Looking at the eval results, there are even cases where the context makes the agent worse.

Scenario 5, test 1 72% -> 22%

https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337e...