Hacker News new | ask | show | jobs
by tylerekahn 1180 days ago
It’s actually as low as 0.01% of the original weights.

From the LoRa paper:

>When the pre-trained model is GPT-3 175B, the number of train- able parameters |Θ| can be as small as 0.01% of |Φ0|.