|
|
|
|
|
by bnprks
885 days ago
|
|
I appreciate that the authors released code and weights with their paper! This is the first high-profile DeepMind paper I can recall that has runnable inference code + checkpoints released. (Though I'm happy to be corrected by earlier examples I've missed) I don't yet see a public copy of the training set / example training code, but still this is a good step towards providing something other researchers can build on -- which is after all the whole point of academic papers! |
|
Im I'm wrong that they are using t5 model? At least they are using the sentencepiece t5 vocabulary.
How many GPU hours have they spend training this model? Which training parameters were used?
Don't get me wrong, I find this system fascinating it is what applied engineering should look like. But I'd like to know more about the training details and the initial data they have used as well as the methods of synthetic data generation.