Hacker News new | ask | show | jobs
by marcan_42 2047 days ago
The reason why audio processing (not sampling) does fine without I/Q data is because our ears are almost completely insensitive to the phase relationships between different frequency components, and because additive frequency shifts are not musically useful. That is what is very hard to deal with without representing signals as I/Q. The audio world just doesn't care. Radio does. This is why most textbook audio equalizers (including those used in professional DAWs) have nonlinear phase by default (minimum-phase) unless you opt for a FIR or FFT based mode. That would never fly in radio.
3 comments

That's not really the reason. It's important that the concept of phase is not something absolute, it only makes sense in relation to something. Absolute phase could be defined from the beginning of the universe which is nonsensical .

You are right nobody can hear phase, but nobody can see phase either again because you need to relate (interfere) to something. However it does make a difference if we think about the superposition (interference) of different audio frequency components. We would definitely here some of those phase differences.

That said iq does not make sense in audio processing because it's baseband. There is no carrier wave.

We do not hear phase differences in the relative phase of different audio frequency components. Try it for yourself. Run a song through an allpass filter. It'll sound the same. In fact, speaker systems of all kinds do crazy things to the phase of signals, and nobody cares (what we care about is frequency and transient response).

The same is not true for radio. There, corrupting the phase relationships corrupts the data (for many systems).

Phase is relative, but our ears don't care about relative phase either (at least as long as you don't stick nonlinear filters after, then it starts mattering, but usually in audio things are fairly decorrelated anyway so it only matters in quite specific cases).

Here is an example: https://twitter.com/marcan42/status/1282685645731672064

Demo: https://twitter.com/zwegner/status/1282859889447116809 (interestingly, you can hear the change in Twitter's low quality encode, but it goes away at higher qualities, so it seems their crappy AAC encoder does care about relative phase :-))

It's counterintuitive how little our ears care about phase across frequency bands. This is not true for other kinds of signals.

I just tried this and you're right, complete random phase across the the whole frequency band does not noticeably change things. Funny that we learned that differently. Thanks I learnt something about audio today :).
afaik phase only really matters in audio for speaker enclosure design, placement, and likewise microphones.

However, because its virtually unknown to audio folks theres perceptible nodes everywhere, if you can hear them.

Audio processing isn't shifted downto baseband or shifted at all, so there is no need for IQ. Its all real. If instead of a direct mix down to baseband, you tell the sdr to mix the minimum frequency in the signal you care about down to just above zero, you can work without i and q. For instance, if you mix an am radio freq down to audio frequency, its all real and you can hear it and represent it as an array of real values.

Edit, this is how the Airspy sdr works, to avoid iq imbalance like you get in the direct conversion receivers in most sdrs.

Second edit for terminology. Mixing is multiplying by a frquency to shift frequency. Baseband means you shifted the center of the frquencies you care about to zero, so half of the frequency content is negative. Negative frequencies are what drive that mean imaginary number into the whole thing.

You're making the mistake of assuming that the only purpose of IQ data is to represent negative frequencies after downconversion. This is not true. The IQ representation is extremely useful for certain kinds of processing, even if you're working in baseband. There are plenty of reasons to take a real baseband signal, run it through a Hilbert transform to get a Q, and process it as IQ data.

It just so happens that audio DSP algorithms happen to almost never care about those exact kinds of processing, due to the way our ears and brains work. And thus, IQ data is not used in audio. But it's not because it's baseband. It's because our ears don't care about phase relationships (which is one thing you can more easily preserve in the IQ domain) and because frequency shifts like downconversion are not useful in music since they destroy the harmonic relationships in the sound.

I wouldn't have used the term real and baseband together, but I think I understand what you mean. I've been frustrated when people describe a modulation real when they could have deacrbed it more elegantly complex. With modern floating point registers being so large the phase loss is less important, but sometimes the representation just makes more sense symmetrical around zero (DC). Could you explain what you mean by harmonic relationships in sound? Does that imply AM will destroy some quality of the music even if you used a 22khz wide band?
I mean if you add 10Hz to all frequency components in audio, what used to be harmonics (rational multiples of the fundamental frequency) stop being harmonics and it sounds like a dissonant mess. There is no reason to ever frequency-shift music/audio by an offset (i.e. the same thing modulation does in radio, or multiplying by a carrier in the IQ time domain). The only frequency shifting we do in audio is by multiplying the frequencies (that's resampling in the time domain), which is a different story.

100,200,400Hz is a consonant tone, while 110,210,410Hz is a dissonant mess

AM doesn't have this problem because it has symmetric sidebands and a carrier (so a tuning offset does not result in audio frequency shift), but SSB does. If you listen to an SSB transmission without your tuning being perfect, it sounds horrible. Voice sounds distorted, and music is hideous. I'm having trouble finding an example of the latter, probably because nobody dares put music through SSB :-) (but you can do this easily enough in gnuradio by upconverting a song with a 10Hz offset, for example)

Also, thank you for taking the time to educate me. I didn't take any signal processing classes in my EE degree, so I learned everything on the job and have gaps. How does autotune not sound horrible, if they do it right it is indistinguishable, or so I have read.
Autotune works by resampling and doing time stretching (not sure if in the time or frequency domain, depends on the technology; there are many variant ways of doing this) in order to decouple pitch and duration to make adjustments, so it doesn't break harmonic relationships.

Audio time stretching (or equivalently, changing the pitch without changing time) is not a clearly defined process with a mathematical description (unlike plain resampling or modulation) but rather a semi-heuristic process that takes into account psychoacoustics. But yes, in practice, for small adjustments of a monophonic sample like a voice, modern algorithms sound really good.

I've heard what you are talking about in SSB, I didn't know what it was. I don't quite understand the AM thing, is it that a tuning offset would grab the image of the other sideband to correct stuff?
AM reception basically uses envelope tracking, so you don't really care about the carrier frequency. It's really just "how much power am I receiving". The tuning ends up defining the window of spectrum you average power over.

In the frequency domain, you could think of AM demodulation as computing the width (and phase!) between the carrier images on both sidebands. It doesn't matter if the signal is a bit off to the side, because the width will be the same. You have a mirror image which gives an absolute reference.

In the IQ domain, you look at the magnitude of the vectors, not their angle, so you don't care about the frequency.

In SSB you only have one sideband, and often no carrier at all, so there is no reference. You need to nail the frequency to get a proper signal out. And even then AIUI your phases will be random, though that doesn't matter for audio.

Yes, our ears hear power not amplitude so phase isn't so important... except maybe for strange mixing products and reflections off of walls? OK the Audiophiles can hear some of that, but generally it's true if you're listening through headphones.
We do hear relative phase relationships at any given frequency between both ears. So if you phase shift one side of a stereo signal and not the other, then yes, that is very audible.

But nodes and mixing products are independent of overall phase across the power spectrum, in a linear system. So if you apply the same phase change to both left and right, the distribution of nodes in the room won't change. The only time these inter-frequency phase relationships start to matter is when you introduce nonlinearities, like distortion.

Yes, directional hearing is quite sensitive to phase, but there are often multi-reflections inside the outer ear that allow some people to hear phase discontinuities in mono.
Anyone can hear phase discontinuities because any phase discontinuity is just a burst of high frequency content.

But typical reflections off of surfaces are largely linear as far as I know, and any linear operation will not introduce any power spectrum changes that are phase dependent. As far as I know, the the ear canal can be largely modeled as a linear system (to within the thresholds of hearability).

The only way to hear phase is to introduce a nonlinearity. That then generates harmonics (or sometimes even lower frequencies), and their power spectrum depends on the specific phase relationships of the incoming signal.

A physical example of a nonlinearity would be a vibrating surface that hits another surface at a certain excursion. Depending on the relative phases of the excitation signal, you can have different peak excursion, and therefore clearly get a different result if one phase set makes it reach the other surface and another one doesn't.