Hacker News new | ask | show | jobs
by arugulum 1045 days ago
While MoE-LoRAs are exciting in themselves, they are a very different pitch from full on MoEs. If the idea behind MoEs is that you want completely separate layers to handle different parts of the input/computation, then it is unlikely that you can get away with low-rank tweaks to an existing linear layer.