| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cubefox 1069 days ago
	This paper came out well after GPT-4, so apparently this was indeed a secret before then.

3 comments

famouswaffles 1069 days ago

The user I was replying to was talking about the now and future.

We also have no indication sparse models outperform dense counterparts so it's scale either way.

link

HeWhoLurksLate 1069 days ago

Is there a difference here between a secret and an unknown? It may well be that some researcher / comp engineer had an idea, tried it out, realized it was incredibly powerful, implemented it for real this time and then published findings after they were sure of it?

I'm more of a mechanical engineering adjacent professional than a programmer and only follow AI developments loosely

link

l33tman 1069 days ago

The quoted paper yes, but the MoE concept and layers and training is old.

Published as a conference paper at ICLR 2017

OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton and Jeff Dean

link