Hacker News new | ask | show | jobs
by foundatron 104 days ago
Right now OctopusGarden logs every LLM call with token counts and cost, and the SQLite store records each run and iteration (spec hash, scores per scenario, generated code). So you get a full trace of what was generated, what it was tested against, and how it scored.

For approvals, the current model is that the spec is the approval. If the spec is right and scenarios pass at 95%+ satisfaction, the code ships. There's no PR review step by design (the "code is opaque weights" philosophy).

That said, you could totally layer approvals on top. Gate on spec changes, require sign-off before a run kicks off, or add a human checkpoint between "converged" and "deployed." The tool doesn't enforce a deployment pipeline, so that's up to your org's workflow.

Worth noting: this is purely a hobby project at this point. It hasn't been used in any commercial setting. The guard rails and approval workflow stuff is where it would need the most work before anyone used it for real.