| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brucethemoose2 962 days ago
	Cool! But the GitHub repo isnt visible for me yet. Also, can y'all dumb it down for a simple end user like me? Is this actually distilling the model down to a smaller parameter count, or is it just reducing VRAM/compute during training and during inference with a lora? Or something else?

1 comments

ibuildthings 961 days ago

Github repo should be visible now.

It is not distilling the model, it is reducing the model weights on the fly and uses LoRA for training/fine-tuning. After the training phase, we explain how to merge the LoRA weights with the pruned weights to achieve faster inference speed

link