Hacker News new | ask | show | jobs
by mountainriver 261 days ago
> LoRA works well when not capacity constrained, i.e., the number of trainable parameters exceeds the amount of information to be learned, which can be estimated in terms of dataset size

I’m shocked they didn’t look at progressive merging of LoRAs. Research shows that’s the best way of improving its ability to model higher level features.

Seems like a massive miss, not to mention there is other research that contradicts a lot of their findings. This feels a bit like a researchers first pass at learning LoRA

2 comments

I'm not sure why progressive LoRa merging needs to be addressed here. They show there is a regime of problem where LoRa performs equivalently to FFT.

Progressive merging of LoRa is somewhere inbetween and categorically more complex than just LoRa so would be dominated by standard LoRa in that case.

While progressive merging could train faster as fewer params are trainable at any given time, it results in very larger adapter diffs OTO the size of the original model and doesn't retain the benefits of being able to deploy multiple adapters over the same base model idt.

I am curious, would you mind sharing a citation?
Don’t forget ReLoRA! https://arxiv.org/abs/2307.05695