|
|
|
|
|
by e12e
1016 days ago
|
|
Are they upsampling - whatever that means in the context of datasets? AFAIU slim pajama is about 627B tokens, and Starcoder: > approximately 250 Billion tokens. Ed: I see TFA says: > Combined Dataset Size - Around 950B tokens > Total Tokens During Training - 3 trillion (slightly more than 3 epochs/1430k steps) ... but I'm not seeing how one becomes three?
That's more like 1 trillion than 3 trillion tokens? |
|