Y
Hacker News
new
|
ask
|
show
|
jobs
by
sjmaplesec
104 days ago
There's so much more we can do around activation and skills creation. Looking at the eval results, there are even cases where the context makes the agent worse.
Scenario 5, test 1 72% -> 22%
https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337e...