| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ibuildthings 959 days ago
	Github repo should be visible now. It is not distilling the model, it is reducing the model weights on the fly and uses LoRA for training/fine-tuning. After the training phase, we explain how to merge the LoRA weights with the pruned weights to achieve faster inference speed