| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stephenroller 2203 days ago
	The dataset can be obtained around the web. It's mostly CommonCrawl, Reddit, Toronto Book Corpus, and Wikipedia. You can find a very comparable corpus open sourced and easy to use on the [T5 repo](https://github.com/google-research/text-to-text-transfer-tra...)