| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by l33tman 1071 days ago

The quoted paper yes, but the MoE concept and layers and training is old.

Published as a conference paper at ICLR 2017

OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton and Jeff Dean