|
|
|
|
|
by pinkmuffinere
242 days ago
|
|
Why do the voices all sound so similar? I'm not talking about accent, I'm talking about the pitch, timbre, and other qualities of the voice themselves. For instance, all the phrases I heard sounded like they were said by a medium-set 45 year old man. Nothing from kids, the elderly, or people with lower / higher-pitch voices. I assume this expected from the dataset for some reason, but am really curious about that reason. Did they just get many people with similar vocal qualities but wide ranges of accents? |
|
> By clicking or tapping on a point, you will hear a standardized version of the corresponding recording. The reason for voice standardization is two-fold: first, it anonymizes the speaker in the original recordings in order to protect their privacy. Second, it allows us to hear each accent projected onto a neutral voice, making it easier to hear the accent differences and ignore extraneous differences like gender, recording quality, and background noise. However, there is no free lunch: it does not perfectly preserve the source accent and introduces some audible phonetic artifacts.
> This voice standardization model is an in-house accent-preserving voice conversion model.