Hacker News new | ask | show | jobs
by YeGoblynQueenne 1663 days ago
That sounds like too much input. Remember that Copilot is based on GPT-3, so its input size is limited to 2048 tokens.

I think it's more simple to assume that "get_*_input" is a common name for a function that reads input from a stream and so that this kind of string is common in Copilot's training data. Again, remember: GPT-3. That's a large language model trained on a copy of the entire internet (the CommonCrawl dataset) and then fine-tuned on all of github. Given the abundance of examples of code on the internet, plus github, most short progams that anyone is likely to write in a popular language like Python are already in there somewhere, in some form.

The form is an interesting question which is hard to answer because we can't easily look inside Copilot's model (and it's a vast model to boot). The results are surprising perhaps, although the way Copilot works reminds of program schemas (or "schemata" if you prefer). That's a common technique in program synthesis where a program template is used to generate programs with different varaible or function names etc. So my best guess is that Copilot's model is like a very big database of program schemas. That, as an aside.

Anyway I don't think it has to peek at other open files etc. Most of the time that would not be very useful to it.

1 comments

You're right, I was wrong.

> GitHub Copilot uses the current file as context when making its suggestions. It does not yet use other files in your project as inputs for synthesis. [1]

[1]: https://copilot.github.com/

Well I just guessed. So... we arrived at the correct answer through our interaction? :)