Hacker News new | ask | show | jobs
by rasbt 845 days ago
Not sure, but in general, it looks like ZipLoRA is only useful in specific contexts like when you have two different tasks you want to optimize for (like style and content in a vision context). DoRA is more general, it's basically normalizing and scaling the LoRA matrices to get much better performance. According to the paper, it even works great for low ranks, which also effectively makes it even more parameter-efficient than OG LoRA.
1 comments

I just read the article, nice write up! I think it would benefit from a short explanation of what the magnitude vector (m) and the directional matrix (V) are, I'm not familiar with that kind of decomposition.

Not related to the article but tangentially relevant, would it be possible to train a LoRA or DoRA with a high rank, and then use SVD to see if the rank is too high and truncate to a better value of r? Maybe use different ranks for different layers after some training?

Thanks for the feedback. Clarifying definitely wouldn't hurt. Added a paragraph and new figure at the top of the DoRA section: https://magazine.sebastianraschka.com/i/141797214/introducin...

I haven't tried what you were suggesting, but that sounds actually plausible. Interesting idea!