|
|
|
|
|
by mountainriver
261 days ago
|
|
> LoRA works well when not capacity constrained, i.e., the number of trainable parameters exceeds the amount of information to be learned, which can be estimated in terms of dataset size I’m shocked they didn’t look at progressive merging of LoRAs. Research shows that’s the best way of improving its ability to model higher level features. Seems like a massive miss, not to mention there is other research that contradicts a lot of their findings. This feels a bit like a researchers first pass at learning LoRA |
|
Progressive merging of LoRa is somewhere inbetween and categorically more complex than just LoRa so would be dominated by standard LoRa in that case.
While progressive merging could train faster as fewer params are trainable at any given time, it results in very larger adapter diffs OTO the size of the original model and doesn't retain the benefits of being able to deploy multiple adapters over the same base model idt.