How do you reason about retries and correctness once a stream has already started? For example, how do you avoid duplicated or missing tokens if a provider fails mid-stream?
This is one of the harder problems, and there isn’t a perfect answer.
The main thing we try to avoid is pretending mid-stream retries are the same as pre-request retries. Once a stream has started, we treat it as a sequence of events with checkpoints rather than a single opaque response. Retries are scoped to known safe boundaries, and anything ambiguous is surfaced explicitly instead of silently re-emitting tokens.
In other words, correctness is prioritized over pretending the stream is seamless. If we can’t guarantee no duplication, we make that visible rather than hide it.
The main thing we try to avoid is pretending mid-stream retries are the same as pre-request retries. Once a stream has started, we treat it as a sequence of events with checkpoints rather than a single opaque response. Retries are scoped to known safe boundaries, and anything ambiguous is surfaced explicitly instead of silently re-emitting tokens.
In other words, correctness is prioritized over pretending the stream is seamless. If we can’t guarantee no duplication, we make that visible rather than hide it.