|
|
|
|
|
by maxbond
326 days ago
|
|
The optimizer is functioning correctly, and the pattern really exists in the training data. But consider: - This behavior damages the model's performance on out of sample data; every word you predict during silence increases the transcript's Word Error Rate. - These translation credits are an artifact of our training data, and not a reflection of the process we are modeling (spoken language). So, while you are correct about the mechanism at work here, it is still correct to call learning a spurious pattern which damages our performance "overfitting". |
|