| HN Mirror

I thought this specific model was referring to self-instruction using both synthetic prompts (generated from few-shot in-context prompting of presumably some OpenAI model, the original paper used text-davinci-002) as well as synthetic code (presumably Code Llama 7 like for self-instruct) subsequently validated with execution?

The differences being it's not just training on unvalidated synthetic data and this specific method (per the unnatural questions paper) results in increased instruction diversity which confers some added advantage and I'm assuming explains the performance gain over the also synthetic self-instruct code?

I may be misunderstanding but this seems more nuanced than just training on synthetically AI-generated code and is more validating of synthetic instructions (i.e. low resource setting) rather than synthetic code (i.e. high resource setting).