What's this stuff about the model catering to ‘80%’ of generative AI tasks? What model do they expect me to use for the other 20% of the time when my question needs reasoning smarts.
There are APIs that use a very small model to determine the complexity of the request then route it to different apis or models based on the result of that classifier model.
This way you can do cheap/local automatically without the api client having to know anything about it, and the proxy will send the requests out to an expensive big model only when necessary.
This way you can do cheap/local automatically without the api client having to know anything about it, and the proxy will send the requests out to an expensive big model only when necessary.