Hacker News new | ask | show | jobs
by magicalhippo 734 days ago
> In this case, the experts could literally be routing / weighting of the LoRas

Hmm yes, good point. Hard to tell with so little to go on.

edit: I assumed they were matrices given they were squares in the figure, just squashed to fit in the LoRA stackup, and given that they'd be low-dimensional so few parameters due to that.

> The thing that’s a little weird to me is that you’d need to keep retraining the experts.

Yeah my impression was this is more for static knowledge, like if you wanted to have a Wikipedia-assistant say.

It got me thinking though, if it could be a stepping stone towards something more dynamic. Say could you use something like the W = W_0 + dW idea to tweak the cross-attention mechanism to select newly added experts somehow?

Again, not into the AI scene, just like entertaining these shower thoughts.