|
|
|
|
|
by zozbot234
167 days ago
|
|
Large MoE models are more socially accepted because medium/large sized MoE models can still be quite small wrt. expert size (which is what sets the amount of required VRAM). But a large dense model is still challenging to get to run. |
|
The Llama 4 models are MoE models, in case you are unaware, since it feels like your comment feels was implying they were dense models.