|
|
|
|
|
by magicalhippo
734 days ago
|
|
> In this case, the experts could literally be routing / weighting of the LoRas Hmm yes, good point. Hard to tell with so little to go on. edit: I assumed they were matrices given they were squares in the figure, just squashed to fit in the LoRA stackup, and given that they'd be low-dimensional so few parameters due to that. > The thing that’s a little weird to me is that you’d need to keep retraining the experts. Yeah my impression was this is more for static knowledge, like if you wanted to have a Wikipedia-assistant say. It got me thinking though, if it could be a stepping stone towards something more dynamic. Say could you use something like the W = W_0 + dW idea to tweak the cross-attention mechanism to select newly added experts somehow? Again, not into the AI scene, just like entertaining these shower thoughts. |
|