|
|
|
|
|
by Mathnerd314
373 days ago
|
|
Wasn't there that thing about how large LLM's are essentially compression algorithms (https://arxiv.org/pdf/2309.10668)? Maybe that's where this article is coming from, is the idea that finetuning "adds" data to the set of data that compresses well. But that indeed doesn't work unless you mix in the finetuning data with the original training corpus of the base model. I think the article is wrong though in saying it "replaces" the data - it's true that finetuning without keeping in the original training corpus increases loss on the original data, but "large" in LLM really is large and current models are not trained to saturation so there is plenty of room to fit in finetuning if you do it right. |
|