|
|
|
|
|
by appplication
914 days ago
|
|
I mean this isn’t too surprising that smaller models do better. I imagine transformers are as prone to overfitting as any statistical data model. Also there is probably some selection bias: bigger models are more expensive and there are just less people training and iterating with them |
|
I can’t imagine this is anything but selection bias.