I’d be really good to allow more than two models and change dynamically based on multiple constraints like latency, reasoning complexity, costs, etc.
But the underlying "how to choose a model that's smart enough but not too smart" seems difficult to understand.
But the underlying "how to choose a model that's smart enough but not too smart" seems difficult to understand.