that's the user-facing definition but the implementation distinction matters more.
"takes longer than you're willing to wait" describes the UX, not the architecture. the engineering question is: does the system actually free up the caller's compute/context to do other work, or is it just hiding a spinner?
nost agent frameworks i've worked with are the latter - the orchestrator is still holding the full conversation context in memory, burning tokens on keep-alive, and can't actually multiplex. real async means the agent's state gets serialized, the caller reclaims its resources, and resumption happens via event - same as the difference between setTimeout with a polling loop vs. actual async/await with an event loop.
IMO feels sorta like Simon Willison's definition of agents. "LLMs in a loop with a goal" feels super obvious, but not sure if I would have described it that way in hindsight
One nuance that helps: “async” in the turn-based-telephone sense (you ask, it answers, you wait) is only one way agents can run.
Another is many turns inside a single LLM call — multiple agents (or voices) iterating and communicating dozens or hundreds of times in one epoch, with no API round-trips between them.
That’s “speed of light” vs “carrier pigeon”: no serialization across the boundary until you’re done. We wrote this up here: Speed of Light – MOOLLM (the README has the carrier-pigeon analogy and a 33-turn-in-one-call example).
Speed of Light vs Carrier Pigeon:
The fundamental architectural divide in AI agent systems.
The Core Insight: There are two ways to coordinate multiple AI agents:
Carrier Pigeon
Where agents interact: between LLM calls
Latency: 500 ms+ per hop
Precision: degrades each hop
Cost: high (re-tokenize everything)
Speed of Light
Where agents interact: during one LLM call
Latency: instant
Precision: perfect
Cost: low (one call)
MCP = Carrier Pigeon
Each tool call:
stop generation →
wait for external response →
start a new completion
N tool calls ⇒ N round-trips
MOOLLM Skills and agents can run at the Speed of Light. Once loaded into context, skills iterate, recurse, compose, and simulate multiple agents — all within a single generation. No stopping. No serialization.
Maybe, but that's what I thought while reading the "what actually is async?" part of the post, so I don't think I got biased towards the answer by that point.
"takes longer than you're willing to wait" describes the UX, not the architecture. the engineering question is: does the system actually free up the caller's compute/context to do other work, or is it just hiding a spinner?
nost agent frameworks i've worked with are the latter - the orchestrator is still holding the full conversation context in memory, burning tokens on keep-alive, and can't actually multiplex. real async means the agent's state gets serialized, the caller reclaims its resources, and resumption happens via event - same as the difference between setTimeout with a polling loop vs. actual async/await with an event loop.