|
|
|
|
|
by zackangelo
253 days ago
|
|
Just a bit of feedback: > Instead of one brittle giant, we orchestrate a Mixture of Experts… “mixture of experts” is a specific term of art that describes an architectural detail of a type of transformer model. It’s definitely not using smaller specialized models for individual tasks. Experts in an MoE model are actually routed to on a per token basis, not on a per task or per generation basis. I know it’s tempting to co-opt this term because it would fit nicely for what you’re trying to do but it just adds confusion. |
|