Hacker News new | ask | show | jobs
by fancyfredbot 1163 days ago
You get diminishing returns as you increase the rank, so with a fixed training budget it's not clear whether you get the best return from increasing rank vs increasing something else. If you start off by training DynLORA with max rank 8 you can see returns diminish fast beyond rank 5. Then you can use rank 5 for the rest of your training. You wouldn't know that with LoRA. I think this is the idea behind the paper. If you are just going to use your entire budget training a DyLoRA with max rank 8 then you're right there's no advantage over LoRA with rank 8. You'd have to use the ability to assess multiple ranks in order to see some benefit.
1 comments

I can see that. But are we sure that a rank-based difference that doesn't manifest early in the training process won't manifest as you get further along? See also 'grokking' [0]

[0]: https://arxiv.org/abs/2201.02177

Not sure there's any way to know beforehand whether that would happen but the advantage of DyLoRA is that at least you will know afterwards whether you really needed the full rank whereas with LoRA you wouldn't? In some cases that might not be valuable information but I guess you'd rather know than not.