Hacker News new | ask | show | jobs
by hughleat 725 days ago
Yep! No GCC on this one. And yep, that's not far off how the pretraining data was gathered - but with random optimisations to give it a bit of variety.
1 comments

Do you have more information on how the dataset was constructed?

It seems like somehow build systems were invoked given the different targets present in the final version?

Was it mostly C/C++ (if so, how did you resolve missing includes/build flags), or something else?

We plan to have a peer reviewed version of the paper where we will probably have more details on that. Otherwise we can't give anymore details than in the paper or post, etc. without going through legal which takes ages. Science is getting harder to do :-(