| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by liuliu 736 days ago
	That's where it is confusing to me. They mentioned that for LoRA fine-tuning, the router weights are frozen, so you don't update the routing when training different concept. But how that expert router is trained? Could be a pretraining with some aux loss to encourage diversity.