|
|
|
|
|
by vessenes
734 days ago
|
|
Thanks for this substantive reply. In this case, the experts could literally be routing / weighting of the LoRas, so it could be a 1x100k (or 1mm or whatever) vector , binary, maybe for simplicity and size. Or floats that are capped to 1/0 at inference time, but trained as floats. The thing that’s a little weird to me is that you’d need to keep retraining the experts. But, I guess it may just be part of the pipeline for adding custom knowledge to the system. |
|
Hmm yes, good point. Hard to tell with so little to go on.
edit: I assumed they were matrices given they were squares in the figure, just squashed to fit in the LoRA stackup, and given that they'd be low-dimensional so few parameters due to that.
> The thing that’s a little weird to me is that you’d need to keep retraining the experts.
Yeah my impression was this is more for static knowledge, like if you wanted to have a Wikipedia-assistant say.
It got me thinking though, if it could be a stepping stone towards something more dynamic. Say could you use something like the W = W_0 + dW idea to tweak the cross-attention mechanism to select newly added experts somehow?
Again, not into the AI scene, just like entertaining these shower thoughts.