|
|
|
|
|
by hunter2_
1366 days ago
|
|
The relevant phenomena are known as localization and the cocktail party effect. If you put a microphone in the middle of a cacophonous cocktail party, it would be hard to follow any given conversation by listening to just that one combined signal. But if you're actually there, your brain can hone in on any of several conversations. Having dialog in a center speaker means it comes from a different location than the music/fx, so it's easy to hone in on it even if it's a little quieter than the music/fx. Having dialog in the same speakers as music/fx makes it much harder. The specified 5.1 to 2.x mixdown ratios might be good or might be inadequate depending on how correlated the original left track is with the original right track. A ridiculously loud blast only on the 5.1 left means your brain can hear dialog from your 2.1 right unimpeded. A medium volume explosion on the 5.1 left and right (but not center!) leaves you with no 2.1 speaker producing dialog without it being masked by the explosion, especially if the explosion sound is mono-ish. |
|
That's because a human is not 1 microphone. It's 2 microphones, with a known distance between the 2, which allows realtime 3d positioning and isolation of sound to an area.
The open source hardware "ReSpeaker" allows to start experimenting how a microphone array works, including why the cocktail party effect doesn't really affect us in most cases.
The notable exception is if there's a signal that is generated perfectly on the plane perpendicular to the 2 ears. Then, humans have a hard time localizing it between front or back (180deg swap). We can still get an angular vector where the sound is. However simply turning your head removes this constraint exception.
(Also bring able to your your head and move your body also shows a visual-acoustic SLAM algorithm going on in your brain.)