Hacker News new | ask | show | jobs
by vladf 1163 days ago
The optimal rank could differ across layers
1 comments

I would be shocked if the "optimal rank" in terms of performance wouldn't be using the maximum rank from the DynLoRA across all layers.
Err, I suppose trivially, the higher rank terms include the lower-rank subnets, so they dominate in terms of quality.

But if you have some capacity constraint (e.g., memory, I guess?) then you can imagine dynamic rank allocation helping in the case where the maximum rank across all layers isn't within budget.

It's a bit of a stretch though, I agree

As someone else mentioned [0], the procedure would basically be to train a DyLoRA for an initial few iterations, then do a search among the layers to find the best scoring combination of ranks, and then train pruned to just use those ranks to completion.

Seems complicated but I could see it being useful potentially.

[0]: https://news.ycombinator.com/item?id=35517353