|
|
|
|
|
by whimsicalism
1163 days ago
|
|
I'm unsure of the value of dynamically reducing the rank of the LoRA matrix at inference time given that probably most of the parameter count comes from the original weights rather than the LoRA diff. But nonetheless, training time improvements look interesting. e: Oh I see, the training time improvement is compared to a grid search over the LoRA rank. Not for a single run. I am not convinced that you shouldn't just train on the highest possible rank that you can with your compute budget. If you can train a DynLoRA with rank 8, why not just train a LoRA with that rank? |
|
Maybe if the "optimal rank" of LORA applies to any adaptation and you interested in training multiple adaptations for different use cases?