Hacker News new | ask | show | jobs
by brucethemoose2 962 days ago
Cool! But the GitHub repo isnt visible for me yet.

Also, can y'all dumb it down for a simple end user like me? Is this actually distilling the model down to a smaller parameter count, or is it just reducing VRAM/compute during training and during inference with a lora? Or something else?

1 comments

Github repo should be visible now.

It is not distilling the model, it is reducing the model weights on the fly and uses LoRA for training/fine-tuning. After the training phase, we explain how to merge the LoRA weights with the pruned weights to achieve faster inference speed