Hacker News new | ask | show | jobs
by prosim 992 days ago
Footnote 1 on page 2 explicitly mentions the 3.5 model and the research in this paper is only about auto completion: https://arxiv.org/pdf/2306.15033.pdf

And this blog post states “beyond Codex”, again for auto completion: https://github.blog/2023-07-28-smarter-more-efficient-coding...

Lastly, OpenAI states on the original Codex page: “OpenAI Codex is a descendant of GPT-3; its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.” - It included GitHub repos, but it never was only GitHub repos. https://openai.com/blog/openai-codex

Update: GitHub Community Manager confirms it here: https://github.com/orgs/community/discussions/56975#discussi...

1 comments

As I said to the other commenter, I specifically avoided saying "only", I said "primarily", and I should have clarified that I meant "primarily fine-tuned". My point is simply that it is far more likely to spit out results that are patterned after GitHub than results that are patterned after any inter-programmer communications.

Also, I wasn't contesting that autocomplete uses GPT 3.5 as the base model, I was contesting the idea that it uses the same derivative model as chat.