|
|
|
|
|
by liuliu
736 days ago
|
|
That's where it is confusing to me. They mentioned that for LoRA fine-tuning, the router weights are frozen, so you don't update the routing when training different concept. But how that expert router is trained? Could be a pretraining with some aux loss to encourage diversity. |
|