|
|
|
|
|
by willgdjones
2897 days ago
|
|
Wouldn't "account creation" date be shared between test and train data and so would essentially constitute a train/set set leak? E.g. user in training set has a meta-data about account creation. Any test set case would only need to look at the account creation date to identify the user. |
|
The reality is, the inclusion of that field in the metadata means that identifying a user from metadata is trivial and no interesting case for ML. In order to publish, they "degraded" the data until it was just interesting enough to be headline worthy. Insulting.