|
|
|
|
|
by danysdragons
1187 days ago
|
|
I had a similar impression from what I saw. Maybe it does perform as well as GPT-3 on narrow tasks that it was explicitly fine-tuned on, but that similarity in performance seems to collapse as soon as you go off the beaten track and give it harder tasks that involve significant reasoning. Consistent with that I've seen a few different sources claim that a small model fine-tuned off the outputs of a large one would likely struggle with unfamiliar tasks or contexts that require transfer learning or abstraction. After seeing how it actually performs in practice, it's hard to have confidence that these benchmarks are reliable measures of model quality. |
|