Hacker News new | ask | show | jobs
by freeqaz 2162 days ago
$35-40 an hour is well within the range of a "that sounds fun to grab my friends and mess around with it for a few hours on the weekend" budget!

Especially if you can use spot instances or a cheaper cloud host.

But I guess without the weights, the floor for this is several thousand dollars to play around with.

Do you know if the data set is being released?

1 comments

The dataset can be obtained around the web. It's mostly CommonCrawl, Reddit, Toronto Book Corpus, and Wikipedia.

You can find a very comparable corpus open sourced and easy to use on the [T5 repo](https://github.com/google-research/text-to-text-transfer-tra...)