Hacker News new | ask | show | jobs
by e12e 1016 days ago
Are they upsampling - whatever that means in the context of datasets?

AFAIU slim pajama is about 627B tokens, and Starcoder:

> approximately 250 Billion tokens.

Ed: I see TFA says:

> Combined Dataset Size - Around 950B tokens

> Total Tokens During Training - 3 trillion (slightly more than 3 epochs/1430k steps)

... but I'm not seeing how one becomes three? That's more like 1 trillion than 3 trillion tokens?

1 comments

Three epochs means it sees each token three times. The dataset is ~1T like you said.