Hacker News new | ask | show | jobs
by jobigoud 2581 days ago
> If a neural net is this good at inferring social, racial, and gender information from audio, humans are even better.

Why would humans automatically be better than machines at that task?

3 comments

We don't know this for sure, certainly, but given that things like social group, race and gender are fundamentally sociocultural phenomena (albeit with some physiological basis in some cases), I would assume that humans will have a considerable advantage. We are natively social beings with decades of social knowledge and learning, whereas these sorts of algorithms are at best seeing these things as epiphenomena in large datasets.

Plus, we have the advantage of understanding what social cues certain speech traits directly 'index', or serve to mark. For instance, I'll bet you can picture a voice of somebody who you could clearly identify as white and male, but who would be exceedingly unlikely to have a long, bushy beard and wear a camoflauge jacket. This is not anatomical, but social, and are not coincidence, but broadcasted social information. Sure, with enough data, we might be able to pick up on these as sort of emergent stereotypes, but we're attuned to such cues through our social experience. And these things are culturally specific, perhaps moreso than a YouTube dataset would be.

I view this as a similar situation to using ML for evaluating things like humor, irony, or aesthetic beauty in cloudscapes: They might be able to bootstrap a model which starts with human judgements, or cluster things in such a way that a 'funny' category emerges, but they're a ways off from understanding the categories themselves, and I think that's relevant.

I think that's the scary thing. We don't even know if we know. It's all subconscious.

For example most people can easily picture a gender, race, age, and where a person is from based on accent.

But I never realized that I also picture how fat they are, and can do it pretty well! It wasn't until I saw that this project can do it very reliably that I realize that I do it all the time too.

What else are we subconsciously picking up on? And as a counter defense, how can we better hide it? Do I need to change my vocabulary and topic choices to something more posh so they think I am eating healthier? What other info leaks are there?

This is a bit different but also an example that made me realize I unconsciously recognize some things I'm unaware of (the difference between pouring hot and cold water): https://www.youtube.com/watch?v=Ri_4dDvcZeM
Actually I see no reason for humans to be any better for these tasks.