Hacker News new | ask | show | jobs
by zamadatix 500 days ago
This is, more or less, what mixture-of-experts (MoE) section is picking away at. The difference is rather than trying to break it out via how rare or common the info is it's broken out by specialization. There isn't as much a focus on keeping the inactive portions on disk because it's more economical to host it all but in a way that lets you use parallelism of requests across the experts. This has the added effect you can constantly select the best expert as the answer is generated without losing efficient hosting.
1 comments

I know what MoE is. Maybe read my comments more carefully and give me the benefit of the doubt.
My comment would've done an astoundingly bad job at introducing you to what mixture of experts is, had that been its goal. It's really about why the MoE-style enhancements don't target how to keep parts on disk when optimizing the model to be most economical to host. There's really not any doubt in that, it's just an observation as to why they optimize the way they do.

If you were put off by defining terms on first use: that's just good form, not something related to you.