| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sjmaplesec 104 days ago

There's so much more we can do around activation and skills creation. Looking at the eval results, there are even cases where the context makes the agent worse.

Scenario 5, test 1 72% -> 22%

https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337e...