|
|
|
|
|
by wanderfowl
2581 days ago
|
|
I'm a speech scientist. This paper is a neat idea, and the results are interesting, but not in the way I'd expected. I had hoped it would the domain of how much person-specific information this can deduce from a voice, e.g. lip aperture, overbite, size of the vocal tract, openness of the nares. This is interesting from a speech perception standpoint. Instead, it's interesting more in the domain of how much social information it can deduce from a voice. This appears to be a relatively efficient classifier for gender, race, and age, taking voice as input. I'm sure this isn't the first time it's been done, but it's pretty neat to see it in action, and it's a worthwhile reminder: If a neural net is this good at inferring social, racial, and gender information from audio, humans are even better. And the idea of speech as a social construct becomes even more relevant. |
|
I'm mostly deaf (cochlear implant) and one thing I've noticed is that if I watch things without my processor on (e.g., completely deaf), I can generally "guess" what a voice sounds like fairly accurately... I've wondered for a long time if it's a trick of my mind, a quirk of statistics, or something that's actually possible.