Hacker News new | ask | show | jobs
by liuliu 736 days ago
I think there is an expert router layer to decide which loras to be integrated at inference time. But they also mention that they freeze the weights for router during training. So it is unclear to me how the router was trained on what loss.
2 comments

Interesting. That’s kind of surprising to me - it would mean with every new Lora they’d need to fine tune the router, no?

Embedding a description of the Lora and using RAG to pull the nearest Loras in the embedding space is where my mind goes; it’s super extensible, minimal additional training for customer use cases, and the way the Loras probably work it’s not terrible to pull a few extras.

Anyway I just speculate —- no idea what they’re actually doing on the backend.

That's where it is confusing to me. They mentioned that for LoRA fine-tuning, the router weights are frozen, so you don't update the routing when training different concept. But how that expert router is trained? Could be a pretraining with some aux loss to encourage diversity.
I am old enough to remember the use of yellow pages to find teh right expert