Hacker News new | ask | show | jobs
by Me1000 1052 days ago
Not an expert (no pun intended), but MoE where each expert is actually just a LoRA adaptor on top of the base model gets me pretty excited. Since LoRA adaptors can be swapped in and out at runtime, it might be possible to get decent performance without a lot of extra memory pressure.
1 comments

While MoE-LoRAs are exciting in themselves, they are a very different pitch from full on MoEs. If the idea behind MoEs is that you want completely separate layers to handle different parts of the input/computation, then it is unlikely that you can get away with low-rank tweaks to an existing linear layer.