| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vishaal_007 149 days ago

I’m another founder on this. One thing that surprised us while building AI features was how often the hard problems weren’t about model choice, but about request lifecycle. Once you introduce streaming, retries, and multiple providers, a lot of implicit assumptions in typical request–response code stop holding.

We kept seeing teams reinvent similar patterns in slightly different ways, especially around correlating events, handling partial failures, and keeping the frontend in sync with what actually happened on the backend. The goal with this writeup was to make those tradeoffs explicit and show what’s actually happening on the wire in each approach.

Curious to hear how others here are handling long-lived or streaming AI requests in production, especially once things start failing in non-obvious ways.