Hacker News new | ask | show | jobs
by bnprks 885 days ago
I appreciate that the authors released code and weights with their paper! This is the first high-profile DeepMind paper I can recall that has runnable inference code + checkpoints released. (Though I'm happy to be corrected by earlier examples I've missed)

I don't yet see a public copy of the training set / example training code, but still this is a good step towards providing something other researchers can build on -- which is after all the whole point of academic papers!

1 comments

Yeap. I'm missing the datasets as well. They have generated 100M synthetic examples ... Were these examples generated with AlphaGeometry? Where is the filtering code and initial input to generate these synthetics?

Im I'm wrong that they are using t5 model? At least they are using the sentencepiece t5 vocabulary.

How many GPU hours have they spend training this model? Which training parameters were used?

Don't get me wrong, I find this system fascinating it is what applied engineering should look like. But I'd like to know more about the training details and the initial data they have used as well as the methods of synthetic data generation.

The methods section of the paper describes the training data generation as well as the model settings: https://www.nature.com/articles/s41586-023-06747-5#Sec16
ty for pointing that out!