|
|
|
|
|
by MacsHeadroom
1160 days ago
|
|
That's what pruning is, but it's not that straight forward and has limits. Finetuning a smaller model on the output of a larger one is much more flexible and reliable. GPT 3.5 is probably a 13B Curie finetuned on the output of full size GPT-3 175B, to give you an idea of the technique. That is smaller than the third smallest StableLM and the same size as LLaMA-13B which can run at useful speeds off of a smart phone CPU. |
|
What is the basis for this assessment?