|
|
|
|
|
by kiratp
1035 days ago
|
|
IMO don’t try to be the one stop shop to host models. There are too many players with all sorts of advancements (eg: stopping grammar, continuous batching, novel quantization etc.) and you won’t be able to keep up. There is a ton of boilerplate around the actual model server that’s just busy work , but if done wrong can be a huge performance suck. Solve that. Build the proxy that works with the most model servers out there. Do it in a way that once you have mindshare, the model server makers will be find it easy to put up a PR so that they can claim your proxy supports their server. Don’t take a hard dependency on non-OSS stuff - being able to build an “on-prem” solution (read “deployed into customer’s VPC”) is table stakes for anyone to use your offering for a lot of enterprise use cases. Edit: another unsolved problem - different models need slightly different prompts to solve the same problem well… |
|
But in general I’m in agreement that this sounds like a separate concept than any given model server.
That said, where is a list of model servers for the most commonly wanted LLMs at this point?
Perhaps maintaining a list of those that do and don’t work with the proxy would be helpful.