| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stpedgwdgfhgdd 3 hours ago
	The thing I do not get with these routers is that you will have more cache misses (5min ttl). And if there is one thing i’ve learned; using the cache is crucial. How does this router translate to $$$ when developing?

1 comments

adchurch 2 hours ago

You're right and that's why we built the router to be cache aware! Once it starts using one model, the threshold to switch to another model will be higher because the additional cost of the cache miss needs to be worth the cost savings or quality increase.

This is the key thing that other routers we've seen miss: they're stateless so for a coding agent use case you end up spending more money due to all the cache misses.

link

alansaber 2 hours ago

That is interesting, sounds like in practice you only end up routing between 2 models

link

adchurch 2 hours ago

I'd say that a typical main agent loop has 1-3 models (obviously very situationally dependent), but when you have subagents those can get routed independently since they have a fresh context window, so there are a lot more degrees of freedom there.

link

echelon 2 hours ago

Or not routing at all.

In practice you just pick one and stick with it until the API stops or you hit performance issues.

link

adchurch 2 hours ago

The choice on the first turn is super important for this reason! But if a user prompt sends the convo in a very different direction then often it does make sense to reroute at that point.

link

mthoms 1 hour ago

This is a key point. I don't know if you can still edit your submission, but I think this would be helpful to mention up front. I'm looking forward to testing this.

link