Hacker News new | ask | show | jobs
by promiseofbeans 408 days ago
I mean, the general purpose models already do this in a way, routing to a selected expert. It's a pretty fundamental concept for ensemble learning, which is what MOE experts are, effectively.

I don't see any reason you couldn't stack more layers of routing in front, to select the model. However, this starts to seem inefficient.

I think the optimal solution will eventually be companies training and publishing hyper-focused expert models, that are designed to be used with other models and a router. Then interface vendors can purchase different experts and assemble the models themselves, like how a phone manufacter purchases parts from many suppliers, even their compeditors, in order to create the best final product. The bigger players (e.g. Apple for this analogy) might make more parts in house, but even the latest iPhone still has Samsung chips in it in teardowns.