|
|
|
|
|
by l33tman
1071 days ago
|
|
The quoted paper yes, but the MoE concept and layers and training is old. Published as a conference paper at ICLR 2017 OUTRAGEOUSLY LARGE NEURAL NETWORKS:
THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton and Jeff Dean |
|