|
|
|
|
|
by agunapal
45 days ago
|
|
If you really think about why MoE came into existence, its to save significant cost during training, I don't think there was any concrete evidence of performance gains for comparable MoE vs dense models. Over the years, I believe all the new techniques being employed in post training have made the models better. |
|
But I don’t think it necessarily saved training cost; if it did, I’d be interested to learn how!