| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lelanthran 261 days ago

> I'm surprised they didn't cite this; it's a well known paper.

I'm surprised you copied and pasted all of that without explaining what it means.

Does LoRA perform worse, better or statistically insignificantly different to FullFT?

You aren't able to tell from what you pasted, are you?

2 comments

cheald 261 days ago

Standard LoRA (W_delta = B@A with standard inits) generally underperforms FT, primarily because of "intruder dimensions" (new high-ranking singular vectors which misalign with the singular vectors of the underlying weights) as outlined in the paper.

There are techniques like PiCa and SVFT which can mitigate much of the loss, though.

link

tangjurine 260 days ago

pica came out two days ago, how did you find out about it?

link

cheald 260 days ago

The one I was referring to was from this paper, first published in May: https://arxiv.org/abs/2505.20211v1

I don't recall how I found out about it, but it was either paperswithcode or an LLM research session working through the intruder dimensions problem.

In my Stable Diffusion tests, it substantially improves LoRA training speed and fidelity, though I've got some experiments that seem to even further substantially improve on it by adding learnable rotations of the singular vectors.

link

crimsoneer 261 days ago

If you're going to be snarky, could you at least clarify what the answer is for those of us who don't stay on top of ML research...?

link

lelanthran 261 days ago

> If you're going to be snarky, could you at least clarify what the answer is for those of us who don't stay on top of ML research...?

The answer is "There's a difference, perhaps", but the GP appeared to imply that LoRA performed worse.

My understanding is that that paper found differences, but did not conclude that the differences were quantifiably better or worse, but this is not what GP's post implied.

link

p1esk 261 days ago

The paper does not make any clear conclusions about LoRA vs FullFT performance, beyond "the two methods seem to be learning different things".

link