Hacker News new | ask | show | jobs
by stephenroller 2157 days ago
The dataset can be obtained around the web. It's mostly CommonCrawl, Reddit, Toronto Book Corpus, and Wikipedia.

You can find a very comparable corpus open sourced and easy to use on the [T5 repo](https://github.com/google-research/text-to-text-transfer-tra...)