Hacker News new | ask | show | jobs
by SomewhatLikely 1008 days ago
There has been speculation that GPT4 is a mixture of experts model, where each expert could be hosted on a different machine. As those machines may report their results to the aggregating machine in different orders then the results could be summed in different orders.
1 comments

Maybe my assumption of how MoE would/could work is wrong, but I had assumed that it means getting different models to generate different bits of text, and then stitching them together - for example, you ask it to write a short bit of code where every comment is poetry, the instruction would be split (by a top level "manager" model?) such that one model is given the task "write this code" and another given the task "write a poem that explains what the code does". There therefore wouldn't be maths done that's combining numbers from the different experts, just their outputs (text) being merged.

Have I completely misunderstood, does Mixture of Experts somehow involve the different experts actually collaborating on the raw computation together?

Could anyone share a recommendation for what to read to learn more about MoE generally? (Ideally that's understandable by someone like me that isn't an expert in LLMs/ML/etc.)