|
|
|
|
|
by PaulHoule
964 days ago
|
|
I could swear I saw this the other day but I couldn’t find it in YOShInOn, here is an older paper which gets results like this for C/C++ but does a lot better with R, see Figure 1 https://arxiv.org/pdf/2308.04477.pdf The results of this kind of eval could be across the board, you could pick out a set of examples worse than what I said (Games, C++) or pick one out that is really good (Algorithms, Ruby) There was this one also https://arxiv.org/abs/2310.12357 which showed some pitfalls…. In some case the LLM could say which project the source code was from which meant it had seen it in the training data and the code ought not to be in the test data. |
|