| HN Mirror

Basically https://replicate.com/

Because it happens when running your own models on localhost too. I have ollama and all the ones they support, but there are some on HuggingFace I run through llama.cpp inside apps where I won't have ollama installed, Replicate also has Stable Diffusion models, not just chat ones, and OpenAI which is its own thing. So it could potentially all be unified under a provider like that.

Haven't actually tried Replicate because I'm just running locally for free, but probably would try to find a single cloud provider for all deployments, like a Heroku of LLMs.