|
|
|
|
|
by lovesdogsnsnow
760 days ago
|
|
This is interesting! Sort of a super mixture of experts model. What's the latency penalty paid with your router in the middle? The pattern I often see is companies prototyping on the most expensive models, then testing smaller/faster/cheaper models to determine what is actually required for production. For which contexts and products do you foresee your approach being superior? Given you're just passing along inference costs from backend providers and aren't taking margin, what's your long-term plan for profitability? |
|
We generally see the router being useful when the LLM application is being scaled, and cost and speed start to matter a lot. However, in some cases the output quality actually improved, as we're able to squeeze the best of GPT4 and Claude etc.
Long-term plan for profitability would come from some future version of the router, where we save the user time and money, and then charge some overhead for the router, but with the user still paying less than they would be with a single endpoint. Hopefully that makes sense?
Happy to answer any other questions!