|
|
|
|
|
by NitpickLawyer
36 days ago
|
|
> Mixture-of-Experts seems like an attempt to do this - the domain structure being extracted into specific sub-models that are presumably trained on particular domain-associated content This is a common miss-conception. MoE LLMs are NOT trained with each expert receiving domain-associated data. It's just an unfortunate naming decision that stuck, and is commonly miss-understood by non practitioners. |
|