Hacker News new | ask | show | jobs
by ij23 1035 days ago
What local/in-K8-cluster models servers would you recommend adding ?

Should we add support for llama.cpp and vllm.ai in the proxy server ? Or should we assume you can host them on your own infra and the proxy server requests your hosted model ?

1 comments

IMO don’t try to be the one stop shop to host models. There are too many players with all sorts of advancements (eg: stopping grammar, continuous batching, novel quantization etc.) and you won’t be able to keep up.

There is a ton of boilerplate around the actual model server that’s just busy work , but if done wrong can be a huge performance suck. Solve that.

Build the proxy that works with the most model servers out there. Do it in a way that once you have mindshare, the model server makers will be find it easy to put up a PR so that they can claim your proxy supports their server.

Don’t take a hard dependency on non-OSS stuff - being able to build an “on-prem” solution (read “deployed into customer’s VPC”) is table stakes for anyone to use your offering for a lot of enterprise use cases.

Edit: another unsolved problem - different models need slightly different prompts to solve the same problem well…

If it makes sense to expand scope to provide a particular model server and the group can easily be the best st it, I say go for it. But do it as a separate (but perhaps connected) project to this.

But in general I’m in agreement that this sounds like a separate concept than any given model server.

That said, where is a list of model servers for the most commonly wanted LLMs at this point?

Perhaps maintaining a list of those that do and don’t work with the proxy would be helpful.

Hey bredren - our supported list (if that's helpful) is here https://litellm.readthedocs.io/en/latest/supported/

We're adding new integrations every day, so if there's any specific one you'd like to add feel free to let us know (discord/ticket/email/etc.) - here's my email: krrish@berri.ai