| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GaggiX 795 days ago
	The dataset is 7 times bigger than the dataset used for Llama 2 as reported by Meta.

1 comments

Has Meta disclosed how much parts of the dataset were repeated? I've only seen the "number of tokens trained" number.