Missed the window for edit: I agree that ideally I'd have a tiny local MOE-kind of model, able to establish the complexity of the request, route simple local requests to the instantly available local agent, and route all the rest outside (to one of several models).