| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by batterseapower 822 days ago
	The other recent improvement suggested for LoRA is DoRA: https://magazine.sebastianraschka.com/p/lora-and-dora-from-s.... It really does seem to strongly outperform LoRA - see also https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.htm...

4 comments

josalhor 822 days ago

I just skimmed over LoRA+ and DoRA and I see no reason why these improvements could not go hand in hand. Actually, LoRA+ seems to be about efficient training while DoRA seems about improving the ability to actually learn, making it significantly more robust. Although I still have my questions on how the improvements of LoRA+ would be applied to the magnitude vector.

link

WithinReason 822 days ago

The two methods seem to be independent, wonder if you can combine them for even better performance.

Interestingly both seem to indirectly modify the optimisation process, in my opinion effectively trying to fix a bad optimiser. Seems like we still have a long way to go after Adam...

link

neodypsis 822 days ago

> Seems like we still have a long way to go after Adam...

A preprint in arxiv suggests that Adam works better than SGD for training LLMs due to the issue of class-imbalance [0]. It appears that scaling the gradient step helps with the training, for example, see another approach suggested in [1].

0. https://arxiv.org/pdf/2402.19449 1. https://arxiv.org/pdf/2402.02347

link

Ger_Onimo 822 days ago

I've just started playing with DoRAs for fine-tuning TTS models towards particular styles of speech, and they're working extremely well!

link

allpaca 822 days ago

Can you tell us more about it? Have you reported the results of your experiments in a post?

link

mysfi 822 days ago

Count me interested here as well, specially if it is about the style of speech. I had a fun project in mind that involved the style of speech.

link

cooljoseph 822 days ago

Those blog posts are pretty bad. Just read the original paper, https://arxiv.org/pdf/2402.09353. The key section is 4.1.

link