| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by grumpopotamus 525 days ago
	2x faster than what?

1 comments

danielhanchen 525 days ago

Oh 2x faster and uses >70% less memory than Hugging Face + Flash Attention 2! I did a CUDA / GPU Mode talk about it here: https://www.youtube.com/watch?v=hfb_AIhDYnA Also to the PyTorch team here: https://www.youtube.com/watch?v=MQwryfkydc0 and the PyTorch Conference here: https://www.youtube.com/watch?v=PdtKkc5jB4g

link

kouteiheika 525 days ago

> Oh 2x faster and uses >70% less memory than Hugging Face + Flash Attention 2!

Is this doing the same type of fine-tuning, or are you comparing full bf16 fine-tuning in HF with 4-bit QLoRA in Unsloth (in which case it's not really an apples-to-apples comparison)? If it's the latter then do you have a comparison of the former?

link

danielhanchen 525 days ago

Oh I compared 4bit QLoRA HF+FA2 with Unsloth 4bit QLoRA.

16bit LoRA have similar boosts in performance!

Full bf16 full finentuning is not yet supported, but it'll come out soon!

link