Hacker News new | ask | show | jobs
by Majromax 1163 days ago
Highest posssible in which combination, though? If you’re fine tuning a model with N layers, then you could apply LoRA to any or all of them. Maybe it’s better to concentrate effort unevenly, in which case a uniform increase of adaptation rank (to compute budget) could still be subpar.
1 comments

Right but the way that this paper proposes determining the best rank is by training a LoRA with the full rank.