Y
Hacker News
new
|
ask
|
show
|
jobs
by
Version467
795 days ago
We don't actually know how big the Dataset is, right? It could be the same dataset used for Llama 2, but trained for more Epochs.
1 comments
GaggiX
795 days ago
The dataset is 7 times bigger than the dataset used for Llama 2 as reported by Meta.
link
baobabKoodaa
792 days ago
Has Meta disclosed how much parts of the dataset were repeated? I've only seen the "number of tokens trained" number.
link