How do you define enormous? "It uses approximately 128 TPUv3 cores (roughly equivalent to ~100-200 GPUs) run over a few weeks". Also last time it took about a year for good replications to pop up.
A year is a fast time to replication in many scientific fields.
While substantial, the resources here are well within reach of many labs, research institutes, and organizations. For this result this big, I'd guess we'll have 2-6 additional implementations in the next 18 months. The problem has been 'open' for 40+ years, so that's lightening fast!
A couple of hundred GPU's is well within the reach of many even moderately well heeled research institutes. It'd seem that about 3 weeks of compute time with 128 TPU v3's would be about $170,311.68.
But of course that cost would only be for the final model. Anyway, I think I am just living in a different world... :-) We could never compete with that
Yah, big grant money. Now the grad students programming the open source clones will only make approximately $0.56, or 4.2 Ramen packs, for their effort. ;)
Also with keeping in mind that once a good open source model is available, researchers with less resources can still use it to fine tune and get new results for far cheaper than training a new model from scratch.
A lot of labs have access to the various strategic supercomputers of the USA.
Ex: Summit has 27,648 V100 GPUs (and those V100s have Tensor units). If you're saying that only 200 GPUs are needed to replicate the experiment, that doesn't even use up 1% of Summit's available utilization.
While substantial, the resources here are well within reach of many labs, research institutes, and organizations. For this result this big, I'd guess we'll have 2-6 additional implementations in the next 18 months. The problem has been 'open' for 40+ years, so that's lightening fast!