|
|
|
|
|
by kiwih
1754 days ago
|
|
Yes, the work definitely lends itself towards the question "is this better or worse than an equivalent human developer?"
This is quite a difficult question to answer, although I agree that simply giving a large number of humans the same prompts could be insightful. However, then you would be rating against an aggregate of humans, rather than an individual (i.e. this is "the" copilot). Also, knowing research, you would really be comparing against a random corpus of student answers, as it is usually students that would be participating in a study such as this. Nonetheless, we think that simply having a quantification of Copilot's outputs is useful, as it can definitely provide an indicator of how risky it might be to provide the tool to an inexperienced developer that might be tempted to accept every suggestion. |
|