Hacker News new | ask | show | jobs
by stephenroller 2159 days ago
No, they aren't releasing the weights. They are releasing it as ML as a service. Right now it's in free beta, but it will open up for commercial usage in the future.

On another note:

At 175B parameters, with float16 representations, the in memory footprint is about 350GB plus activations would take it to another 400GB. You would need 12 or 13 V100GB GPUs to hold it in memory, or three p3.8xlarge. Meaning loading it on AWS would cost around $35-40/hr.

Though if you didn't care about speed, you could load up the weights from disk one at a time and forward through it a few layers at a time on a single GPU.

1 comments

$35-40 an hour is well within the range of a "that sounds fun to grab my friends and mess around with it for a few hours on the weekend" budget!

Especially if you can use spot instances or a cheaper cloud host.

But I guess without the weights, the floor for this is several thousand dollars to play around with.

Do you know if the data set is being released?

The dataset can be obtained around the web. It's mostly CommonCrawl, Reddit, Toronto Book Corpus, and Wikipedia.

You can find a very comparable corpus open sourced and easy to use on the [T5 repo](https://github.com/google-research/text-to-text-transfer-tra...)