| I have a hosted code-first agent builder platform in production, so I respond these question a lot from our customers. 1. Probably the best is fly.io IMHO. It has a nice balance between running ephemeral containers that can support long running tasks, and quickly booting up to respond to a tool call. [1] 2. If your task is truly long running, (I'm thinking several minutes), probably wise to put trigger [2] or temporal [3] under it. 3. A mix of prompt caching, context shedding, progressive context enrichment [4]. 4. I'm building a platform that can be self-hosted to do a few of the above, so I can't speak to this. But most of my customers do not. 5. To start with, a simple postgres table and pgvector is all you need. But I've recently been delighted with the DX of Upstash vector [5]. They handle the embeddings for you and give you a text-in, text-out experience. If you want more control, and savings on a higher scale, have heard good things about marqo.ai [6]. Happy to talk more about this at length. (E-mail in the profile) [1] https://fly.io/docs/reference/architecture/ [2] trigger.dev [3] temporal.io [4] https://www.inferable.ai/blog/posts/llm-progressive-context-... [5] https://upstash.com/docs/vector/overall/getstarted [6] https://www.marqo.ai/ |
I actually tried fly.io briefly with Next.js apps and the deployment experience was smooth. Really interesting to hear you're using it for AI workloads too.
For fly.io with AI workloads: Are you using their Machines or Apps? I'm particularly curious about how you're handling cold starts for LLM tasks, since that was one thing I loved about fly.io for regular Next.js deployments - the cold starts were minimal.