|
|
|
|
|
by CobrastanJorji
57 days ago
|
|
Yeah, the blog distinguishes between "contamination," which it describes as polluting the training data with answers to benchmarking questions, with "temporal leakage," which is polluting the training data with writing after the target date, but those seem to be nearly the same problem. |
|
The latter would be data not at all supposed to be in there, in this case, data after 1930.