Another example of Google giving much data away is 50 trillion digits of pi [1], which contains about 42 TB of data (decimal and hexadecimal combined).
Google Cloud Storage. The files could be dumped as tfrecord in a bucket with "requester pays". So anybody could reproduce it using the open source code, by paying for the costs incurred to move the data from GCS to the training nodes.
It's interesting. It would take ~six years for the Z8 to break even compared to AWS, but traffic into and out of the machine would be $0, and I don't think you're running directly on the metal with AWS, so performance would probably be a bit higher. And then there's storage - I configured, uhh, 120TB of a mixture of SSDs and HDDs. I'm not even going to try and ask AWS for a comparible quote there.
I may or may not have added dual Xeon Platinum 8280s to the Z8 as well. :P
(They are definitely going to exceed their storage quotas.)
I want to see how well weights for these models compress, but it will take me some time to run this code and generate some. I'm guessing they won't compress well, but I can't articulate a reason why.
Is this because they are afraid of the model misused, like used for generating fake reviews? It is frustrating that I've been hearing great news on NLP but am able to try none of them myself.
It's because the model weights are the valuable thing here. The fancy new architectures are nice and everything, but transformer models are a dime a dozen these days. Seems like they're using this as an example to point at and say "Hey, look at us, we support open source!", whereas unless you're willing to go ahead and spend a small fortune on compute (possibly using their GPUs), these models are somewhat useless.