| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pama 483 days ago
	I feel like a kid in a candy shop. Some of these tricks would take way too long to reverse engineer correctly based on the papers. I hope that the releases this week start a renaissance of the use of MoE as baseline academic models.

1 comments

antirez 483 days ago

From this point of view I don't understand what's happening between the actual SOTA models practice and the academic models. The former at this point are all MoEs, starting with GPT4. But then the open models, if not for DeepSeek V3 and Mixtral, are always dense models.

link

woctordho 483 days ago

MoEs require less computation and more memory, so they're harder to setup in small labs

link

kristianp 483 days ago

I assumed gpt 4o wasn't MOE, being a smaller version of gpt-4, but I've never heard either way.

link