|
|
|
|
|
by vladislav
2338 days ago
|
|
They would have able to win and get away with it if they incorporated the knowledge of the external dataset directly into the ML model, provided they had a reasonable estimate on the fraction of overlap between the external data and the test set. A weak version of this would be to just train on the external data in addition to the provided data. A stronger version would train regularly on the provided training data and in addition overfit on a random subset of some percentage of the external data (with some small random prediction error thrown in to obfuscate), which would get equivalent results to what they did with logic. |
|
In this competition, the training code was run on Kaggle's system, so you'd still need to smuggle in the extra data.