|
|
|
|
|
by scosman
714 days ago
|
|
That’s still the point. That model now does exactly one thing, and because of that can do better than a model 50x the size that tries to do everything. It will crush it in instruction following and consistency. A fine tuned 500b parameter model would probably beat the fine tuned 7b model, but only by a bit (depending on task obviously). A lot of that capacity is being used for knowledge, and isn’t needed for extraction/classification tasks. Fine tuning isn’t touching most of those weights. The smaller models need to focus on more general language skills, not answering “describe the evolution of France’s economy in the 1800s”. |
|