|
|
|
|
|
by zamfi
1230 days ago
|
|
What's missing is that they did not use the test set during the training process. A common split is train/validate/test, but all three are used during training -- train to actually train, validate for intermediate loss, test for model comparison. What you want is a fourth, held-out test set that isn't looked at until publish time. This paper has two test sets, but they have different data properties, and it's not clear they were held out until publication. |
|