|
|
|
|
|
by zmjjmz
4578 days ago
|
|
>stop and think about possible sources of contamination One great one from my Machine Learning professor was an assignment where we were required to normalize our data to [0,1]. After doing this and then going through the typical cross-validation cycle, he had us try and figure out where we contaminated our validation sets. As it turns out, we all normalized our data before splitting it up, which meant that training data influenced testing data. It's a simple fix, but if you've done that and gone to run a large convolutional neural network for a week only to find that you made a stupid error like that, it can be pretty painful. (Especially since the bad generalization error might not be obvious until you use it the model in production) |
|