Hacker News new | ask | show | jobs
by batterseapower 775 days ago
The other recent improvement suggested for LoRA is DoRA: https://magazine.sebastianraschka.com/p/lora-and-dora-from-s.... It really does seem to strongly outperform LoRA - see also https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.htm...
4 comments

I just skimmed over LoRA+ and DoRA and I see no reason why these improvements could not go hand in hand. Actually, LoRA+ seems to be about efficient training while DoRA seems about improving the ability to actually learn, making it significantly more robust. Although I still have my questions on how the improvements of LoRA+ would be applied to the magnitude vector.
The two methods seem to be independent, wonder if you can combine them for even better performance.

Interestingly both seem to indirectly modify the optimisation process, in my opinion effectively trying to fix a bad optimiser. Seems like we still have a long way to go after Adam...

> Seems like we still have a long way to go after Adam...

A preprint in arxiv suggests that Adam works better than SGD for training LLMs due to the issue of class-imbalance [0]. It appears that scaling the gradient step helps with the training, for example, see another approach suggested in [1].

0. https://arxiv.org/pdf/2402.19449 1. https://arxiv.org/pdf/2402.02347

I've just started playing with DoRAs for fine-tuning TTS models towards particular styles of speech, and they're working extremely well!
Can you tell us more about it? Have you reported the results of your experiments in a post?
Count me interested here as well, specially if it is about the style of speech. I had a fun project in mind that involved the style of speech.
Those blog posts are pretty bad. Just read the original paper, https://arxiv.org/pdf/2402.09353. The key section is 4.1.