Hacker News new | ask | show | jobs
by pk-protect-ai 885 days ago
Yeap. I'm missing the datasets as well. They have generated 100M synthetic examples ... Were these examples generated with AlphaGeometry? Where is the filtering code and initial input to generate these synthetics?

Im I'm wrong that they are using t5 model? At least they are using the sentencepiece t5 vocabulary.

How many GPU hours have they spend training this model? Which training parameters were used?

Don't get me wrong, I find this system fascinating it is what applied engineering should look like. But I'd like to know more about the training details and the initial data they have used as well as the methods of synthetic data generation.

1 comments

The methods section of the paper describes the training data generation as well as the model settings: https://www.nature.com/articles/s41586-023-06747-5#Sec16
ty for pointing that out!