| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lovesdogsnsnow 760 days ago

This is interesting! Sort of a super mixture of experts model. What's the latency penalty paid with your router in the middle?

The pattern I often see is companies prototyping on the most expensive models, then testing smaller/faster/cheaper models to determine what is actually required for production. For which contexts and products do you foresee your approach being superior?

Given you're just passing along inference costs from backend providers and aren't taking margin, what's your long-term plan for profitability?

2 comments

danlenton 760 days ago

Great question! Generally the neural network used for the router takes maybe ~20ms during inference. When deployed on prem, in your own cloud environment, then this is the only latecy. When using the public endpoints with our own intermediate server, it might add ~150ms to the time-to-first-token, but inter-token-latency is not affected.

We generally see the router being useful when the LLM application is being scaled, and cost and speed start to matter a lot. However, in some cases the output quality actually improved, as we're able to squeeze the best of GPT4 and Claude etc.

Long-term plan for profitability would come from some future version of the router, where we save the user time and money, and then charge some overhead for the router, but with the user still paying less than they would be with a single endpoint. Hopefully that makes sense?

Happy to answer any other questions!

link

jonahx 759 days ago

Do you save the user data, ie, the searches themselves? What do your TOS guarantee about the use of that data?

link

danlenton 759 days ago

We use this data to improve the base router by default. It's fully anonymized, and you can opt out.

link

ColinHayhurst 759 days ago

Without opt out it would be a no go, so that's great to hear. What's the downside of opting out?

link

danlenton 759 days ago

no down side

link

nl 759 days ago

If I was doing this I'd negotiate a volume discount, charge the clients the base rate and pocket the difference.

link

danlenton 759 days ago

definitely on the cards, we're keeping our options open here. Right now just focused on creating value though.

link