|
|
|
|
|
by DigitalNoumena
1027 days ago
|
|
I think the issue of test set contamination is important, but it’s academic - when a model contains a good enough distilled representation of arguably all the code out there, does it really matter whether it can generalise OOD? Realistically how many of the practical use cases where it’ll be applied will be OOD? If you can take GPT4 there then you are either a genius or working on something extremely novel so why use GPT4 in the first place? I understand the goal is for LLMs to get there, but the majority of practical applications just don’t need that. |
|
If its contaminated by the test set being in the model’s training set, then the test is no longer (assuming it was in the first place) a valid measure of whether the model has “a good enough distilled representation of arguably all the code out there”.