|
|
|
|
|
by peaslock
1274 days ago
|
|
Yes, but the losses in Figure 3 increase because the larger models see fewer data to keep the FLOP budget constant, not because of overfitting. Large models do not overfit very much, so the loss of a larger model will still be better compared to a smaller model when you keep dataset size constant. |
|