Hacker News new | ask | show | jobs
by vessenes 736 days ago
Interesting. That’s kind of surprising to me - it would mean with every new Lora they’d need to fine tune the router, no?

Embedding a description of the Lora and using RAG to pull the nearest Loras in the embedding space is where my mind goes; it’s super extensible, minimal additional training for customer use cases, and the way the Loras probably work it’s not terrible to pull a few extras.

Anyway I just speculate —- no idea what they’re actually doing on the backend.

1 comments

That's where it is confusing to me. They mentioned that for LoRA fine-tuning, the router weights are frozen, so you don't update the routing when training different concept. But how that expert router is trained? Could be a pretraining with some aux loss to encourage diversity.