|
|
|
|
|
by Me1000
1052 days ago
|
|
Not an expert (no pun intended), but MoE where each expert is actually just a LoRA adaptor on top of the base model gets me pretty excited. Since LoRA adaptors can be swapped in and out at runtime, it might be possible to get decent performance without a lot of extra memory pressure. |
|