|
|
|
|
|
by jaggederest
127 days ago
|
|
I use that to feed back into my spec development and prompting and CI harnesses, not steering in real time. Every mistake is a chance to fix the system so that mistake is less likely or impossible. I rarely fix anything in real time - you review, see issues, fix them in the spec, reset the branch back to zero and try again. Generally, the spec is the part I develop interactively, and then set it loose to go crazy. This feels, initially, incredibly painful. You're no longer developing software, you're doing therapy for robots. But it delivers enormous compounding gains, and you can use your agent to do significant parts of it for you. |
|
Or, really, hacking in "learning", building your knowhow-base.
> But it delivers enormous compounding gains, and you can use your agent to do significant parts of it for you.
Strong yes to both, so strong that it's curious Claude Code, Codex, Claude Cowork, etc., don't yet bake in an explicit knowledge evolution agent curating and evolving their markdown knowledge base:
https://github.com/anthropics/knowledge-work-plugins
Unlikely to help with benchmarks. Very likely to improve utility ratings (as rated by outcome improvements over time) from teams using the tools together.
For those following along at home:
This is the return of the "expert system", now running on a generalized "expert system machine".