|
|
|
|
|
by xg15
335 days ago
|
|
I think it's a data quality problem first, which might lead to a sort of overfitting as a consequence. How would the AI know that a series of zero-amplitude audio samples should generate the string "[silence]"? It can only know that if the vast majority of silent audio segments in the trainser are consistently labelled with that string. But that doesn't seem to be the case: Silence is either not labeled at all, or labeled with all kinds of different markers or labeled with unrelated things, like copyright credits. So even if the model successfully learns a generalized representation of the concept of "silence", it's not clear at all which of all the different labels it should use for that concept. So what might happen is that the model then starts to overfit on the tiny variations of the individual silence segments, in a desperate attempt to devise some kind of system behind the all the different "silence" labels - which will of course go wrong spectacularly as such a system doesn't exist. (Or if it does, is entirely accidental and not something that should be learned) |
|