It’s my understanding that the entire race to ever-more parameters was driven by that.
Newer large datasets like the ones used here optimize for diversity. (e.g. SlimPajama is a heavily-deduped dataset)
Yeah, the line keeps going down as the model gets bigger. What's your point? That there's a hump in the middle?
Newer large datasets like the ones used here optimize for diversity. (e.g. SlimPajama is a heavily-deduped dataset)