|
|
|
|
|
by ibuildthings
959 days ago
|
|
Github repo should be visible now. It is not distilling the model, it is reducing the model weights on the fly and uses LoRA for training/fine-tuning. After the training phase, we explain how to merge the LoRA weights with the pruned weights to achieve faster inference speed |
|