| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by liuliu 736 days ago
	I think there is an expert router layer to decide which loras to be integrated at inference time. But they also mention that they freeze the weights for router during training. So it is unclear to me how the router was trained on what loss.

2 comments

vessenes 736 days ago

Interesting. That’s kind of surprising to me - it would mean with every new Lora they’d need to fine tune the router, no?

Embedding a description of the Lora and using RAG to pull the nearest Loras in the embedding space is where my mind goes; it’s super extensible, minimal additional training for customer use cases, and the way the Loras probably work it’s not terrible to pull a few extras.

Anyway I just speculate —- no idea what they’re actually doing on the backend.

link

liuliu 735 days ago

That's where it is confusing to me. They mentioned that for LoRA fine-tuning, the router weights are frozen, so you don't update the routing when training different concept. But how that expert router is trained? Could be a pretraining with some aux loss to encourage diversity.

link

badriprof 734 days ago

I am old enough to remember the use of yellow pages to find teh right expert

link