Hacker News new | ask | show | jobs
by ibuildthings 959 days ago
Github repo should be visible now.

It is not distilling the model, it is reducing the model weights on the fly and uses LoRA for training/fine-tuning. After the training phase, we explain how to merge the LoRA weights with the pruned weights to achieve faster inference speed