|
|
|
|
|
by tough
297 days ago
|
|
someone on twitter was exploring and linked to some related papers where you can for example trim experts on a MoE model if you're 100% sure they're never active for your specific task what the bigger wide net bigs you is generalization |
|