Hacker News new | ask | show | jobs
by liuliu 736 days ago
That's where it is confusing to me. They mentioned that for LoRA fine-tuning, the router weights are frozen, so you don't update the routing when training different concept. But how that expert router is trained? Could be a pretraining with some aux loss to encourage diversity.