|
|
|
|
|
by zone411
1277 days ago
|
|
No. It shows the opposite. All model sizes converged to a similar loss as the compute increased towards maximum. But larger models had larger loss for a given compute budget. Their text about Figure 3 confirms what I'm saying: "We find a clear valley in loss, meaning that for a given FLOP budget there is an optimal model to train" |
|