Hacker News new | ask | show | jobs
by vessenes 734 days ago
Thanks for this substantive reply.

In this case, the experts could literally be routing / weighting of the LoRas, so it could be a 1x100k (or 1mm or whatever) vector , binary, maybe for simplicity and size. Or floats that are capped to 1/0 at inference time, but trained as floats.

The thing that’s a little weird to me is that you’d need to keep retraining the experts. But, I guess it may just be part of the pipeline for adding custom knowledge to the system.

1 comments

> In this case, the experts could literally be routing / weighting of the LoRas

Hmm yes, good point. Hard to tell with so little to go on.

edit: I assumed they were matrices given they were squares in the figure, just squashed to fit in the LoRA stackup, and given that they'd be low-dimensional so few parameters due to that.

> The thing that’s a little weird to me is that you’d need to keep retraining the experts.

Yeah my impression was this is more for static knowledge, like if you wanted to have a Wikipedia-assistant say.

It got me thinking though, if it could be a stepping stone towards something more dynamic. Say could you use something like the W = W_0 + dW idea to tweak the cross-attention mechanism to select newly added experts somehow?

Again, not into the AI scene, just like entertaining these shower thoughts.