Hacker News new | ask | show | jobs
by stavros 131 days ago
Not to be all captain hindsight, but I was puzzled as I was skimming the post, as this seemed obvious to me:

Something is async when it takes longer than you're willing to wait without going off to do something else.

2 comments

that's the user-facing definition but the implementation distinction matters more.

"takes longer than you're willing to wait" describes the UX, not the architecture. the engineering question is: does the system actually free up the caller's compute/context to do other work, or is it just hiding a spinner?

nost agent frameworks i've worked with are the latter - the orchestrator is still holding the full conversation context in memory, burning tokens on keep-alive, and can't actually multiplex. real async means the agent's state gets serialized, the caller reclaims its resources, and resumption happens via event - same as the difference between setTimeout with a polling loop vs. actual async/await with an event loop.

yes you got the point here
IMO feels sorta like Simon Willison's definition of agents. "LLMs in a loop with a goal" feels super obvious, but not sure if I would have described it that way in hindsight
One nuance that helps: “async” in the turn-based-telephone sense (you ask, it answers, you wait) is only one way agents can run.

Another is many turns inside a single LLM call — multiple agents (or voices) iterating and communicating dozens or hundreds of times in one epoch, with no API round-trips between them.

That’s “speed of light” vs “carrier pigeon”: no serialization across the boundary until you’re done. We wrote this up here: Speed of Light – MOOLLM (the README has the carrier-pigeon analogy and a 33-turn-in-one-call example).

Speed of Light vs Carrier Pigeon: The fundamental architectural divide in AI agent systems.

https://github.com/SimHacker/moollm/blob/main/designs/SPEED-...

The Core Insight: There are two ways to coordinate multiple AI agents:

  Carrier Pigeon
    Where agents interact: between LLM calls
    Latency: 500 ms+ per hop
    Precision: degrades each hop
    Cost: high (re-tokenize everything)
  Speed of Light
    Where agents interact: during one LLM call
    Latency: instant
    Precision: perfect
    Cost: low (one call)
  MCP = Carrier Pigeon
    Each tool call:
      stop generation → 
      wait for external response → 
      start a new completion
    N tool calls ⇒ N round-trips
MOOLLM Skills and agents can run at the Speed of Light. Once loaded into context, skills iterate, recurse, compose, and simulate multiple agents — all within a single generation. No stopping. No serialization.
Thanks for sharing, that distinction is very helpful.
Maybe, but that's what I thought while reading the "what actually is async?" part of the post, so I don't think I got biased towards the answer by that point.