Hacker News new | ask | show | jobs
by lunarcave 526 days ago
I have a hosted code-first agent builder platform in production, so I respond these question a lot from our customers.

1. Probably the best is fly.io IMHO. It has a nice balance between running ephemeral containers that can support long running tasks, and quickly booting up to respond to a tool call. [1]

2. If your task is truly long running, (I'm thinking several minutes), probably wise to put trigger [2] or temporal [3] under it.

3. A mix of prompt caching, context shedding, progressive context enrichment [4].

4. I'm building a platform that can be self-hosted to do a few of the above, so I can't speak to this. But most of my customers do not.

5. To start with, a simple postgres table and pgvector is all you need. But I've recently been delighted with the DX of Upstash vector [5]. They handle the embeddings for you and give you a text-in, text-out experience. If you want more control, and savings on a higher scale, have heard good things about marqo.ai [6].

Happy to talk more about this at length. (E-mail in the profile)

[1] https://fly.io/docs/reference/architecture/

[2] trigger.dev

[3] temporal.io

[4] https://www.inferable.ai/blog/posts/llm-progressive-context-...

[5] https://upstash.com/docs/vector/overall/getstarted

[6] https://www.marqo.ai/

1 comments

Thanks for the detailed response!

I actually tried fly.io briefly with Next.js apps and the deployment experience was smooth. Really interesting to hear you're using it for AI workloads too.

For fly.io with AI workloads: Are you using their Machines or Apps? I'm particularly curious about how you're handling cold starts for LLM tasks, since that was one thing I loved about fly.io for regular Next.js deployments - the cold starts were minimal.

I'm using the apps mostly. But I think you can use machines for more lower level use cases. IIRC, apps run machines.