| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bodecker 1112 days ago

I assume comments like these, "GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference."

I'm not sure the most canonical paper on mixture of experts but here's one possible:

1 comments

I think when ppl refer to MoE they are referring generally to the Google GLaM paper actually