Hacker News new | ask | show | jobs
by dbmikus 16 days ago
To stop agents from pausing for checkpointing, you can have a deterministic outer loop that re-runs until a stop condition is met.

I think teams need to be able to write nested workflows that transition between code-led and agent-led, with either supporting human-in-the-loop checkpoints.

Been iterating on what this should look like at our startup (https://www.amika.dev/). Model labs are also improving capabilities here, such as Codex's `/goal` and Claude Code's dynamic workflows[1]

The points about API usage cost still stand, but model intelligence is getting cheaper every month! No need to use the frontier model for every part of the work.

[1]: https://code.claude.com/docs/en/workflows

1 comments

It's hard to get that outer loop done, especially considering that Claude doesn't let you automate the harness anymore (it gets prohibitively expensive). Same for gemini. The only option is Codex.

/goal is a dynamic workflow itself, from what I know. Dynamic workflows do not hold the initiative (and can't use any libraries or I/O).

Dynamic workflows do not prevent checkpointing.

I don't see the actual point of your startup, it's a cheap idea - such as most LLM startups out there.

I don't see how models are getting cheaper - I clearly see the opposite trend.

Claude Code's dynamic workflows are AI-generated JavaScript, so unlike `/goal` they can in theory import libraries and perform I/O (not sure that they can currently).

On checkpointing: I explained myself poorly. You're right that using higher level workflows doesn't turn off checkpointing. One can simply make harnesses non-interactive, but that can make models lose coherence over long tasks (because they can't ask for feedback). A higher level coordinator (/goal, CC dynamic workflows) is designed to provide this feedback without human intervention.

On price: older models keep getting cheaper, and most tasks don't need frontier capability. (I'm ignoring the part about subscription subsidies right now, and just talking about API price for tokens)

On my startup Amika: we run programmable cloud computers for agents, plus the workflow systems to guide them. We let people run any agent (Codex, Claude, etc.), prompt it from anywhere (Slack, web, CLI + SSH, API). It's like devboxes for humans + agents, with guardrails[1] to deterministically ensure things about the changes coding agents make (ie don't let agent modify module boundaries, require every DB query carry a multi-tenant org ID filter).

Maybe our website is bad at explaining it, in which case I appreciate any feedback!

[1]: https://docs.amika.dev/guides/code-annotations