Footnote 1 on page 2 explicitly mentions the 3.5 model and the research in this paper is only about auto completion: https://arxiv.org/pdf/2306.15033.pdf
Lastly, OpenAI states on the original Codex page: “OpenAI Codex is a descendant of GPT-3; its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.” - It included GitHub repos, but it never was only GitHub repos. https://openai.com/blog/openai-codex
As I said to the other commenter, I specifically avoided saying "only", I said "primarily", and I should have clarified that I meant "primarily fine-tuned". My point is simply that it is far more likely to spit out results that are patterned after GitHub than results that are patterned after any inter-programmer communications.
Also, I wasn't contesting that autocomplete uses GPT 3.5 as the base model, I was contesting the idea that it uses the same derivative model as chat.
Also, I wasn't contesting that autocomplete uses GPT 3.5 as the base model, I was contesting the idea that it uses the same derivative model as chat.