Hacker News new | ask | show | jobs
by NinetyNine 5226 days ago
In my understanding, 44.1 kHz was chosen because it's twice the maximum of human hearing (22 kHz), and thus you can reproduce all audible sounds without worrying about aliasing (as per the Nyquist-Shannon sampling theory). What is the point of going higher?
2 comments

When downsampling or recording, and when playing back 44.1 kHz audio, the data must be low-pass filtered to eliminate aliasing effects.

But filters aren't perfect. Even a decent low-pass filter (say 3rd order Butterworth) requires an order of magnitude of bandwidth to drop the output 60 dB (practically inaudible). This means that with a Nyquist limit of 22 kHz, you're either attenuating everything above 2.2 kHz (the "knee"), or you're letting some aliasing noise through.

With a 192 kHz sampling rate, the filter's knee can rise to 9.6 kHz, and the stuff between 9.6 kHz and 20 kHz won't be appreciably attenuated.

It's important to note also that this attenuation can't be fixed by simply boosting the high end -- filters are linear, so such an adjustment (with an equivalent order filter) would merely cancel out the low pass filtering and reintroduce aliasing noise.

(Edit: I am not an audio engineer but I have a strong signal theory background. So actual audio engineers please feel free to correct me.)

You have the right idea, but the wrong numbers. Terribly, terribly wrong numbers, it's quite clear you're making those numbers up. No filter for an audio ADC would ever have a cutoff as low as 2.2 kHz, not even if it were for a telephone. (You have a strong signal theory background? No offense, but really?)

When you use a 44.1 kHz sampling frequency, any frequency above 22.05 kHz will be "aliased" and recorded as a lower frequency. This sounds incredibly nasty, like the sounds you'd get out of a broken Commodore 64. So you have to remove frequencies above 22.05 kHz in order to get a clean recording. But human ears can hear up to 20 kHz or so depending on age (e.g., NTSC TVs with cathode ray tubes have a 15 kHz horizontal refresh which drives me nuts, but my parents can't hear it at all).

The trick then, is to design a filter that will let the 20 Hz - 20 kHz band through while stopping everything above 22.05 kHz. We call 20 Hz-20 kHz the "pass band" and 22.05 kHz and above the "stop band". We don't really care what happens to the frequencies between the pass band and the stop band: the range from 20 kHz to 22.05 kHz which can't be heard well enough to be worth preserving and doesn't cause aliasing so it doesn't need cutting out. This is difficult because the stop band is only 1.1x the frequency of the pass band -- for you musicians out there, that's less than the difference between C and D on the western scale. (Just think for a minute: design a filter that lets middle C through, but completely filters out the D above.)

Heavens no you wouldn't use a Butterworth filter for such a task. We want an elliptic filter, probably. 3rd order is no good either, it won't give a sharp enough cutoff. 8th order is better. This will get you a cutoff around 20 kHz with something like 60 dB attenuation at 22.05 kHz. People need a lot of these filters, so you can actually go out and buy a 20 kHz low-pass 8th order elliptic filter as a monolithic chip.

Let's suppose you chose a 48 kHz sampling rate instead. Now the stopband starts at 24 kHz instead of 22.05 kHz. It sounds like a small difference (22.05->24 kHz cutoff) but it's actually a factor of 2 (2.05->4 kHz transition band). This means that with the same components, you can get 80 dB or more attenuation in the stop band.

Now go to 96 kHz. You have to design a filter that rolls off between 20 kHz and 48 kHz. That's easy peasy, and you can reduce the ripple, increase the attenuation, maybe reduce the order (affecting noise) and make all sorts of design tradeoffs that are much easier.

Now think about 192 kHz. What's the point? What does 192 kHz get you that 96 kHz doesn't have? It's already easy enough to design a very nice system at 96 kHz. I think 192 kHz is a bunch of bunk as far as audio is concerned.

That's recording. Now let's talk about playback.

Playback is very similar, everything goes in the opposite direction. You start with a digital signal, convert it to analogue, and put it through a low-pass filter. The aliasing noise is still there, except instead of reflecting high frequencies to low ones, it reflects low frequencies to high ones. So you get the same trade-offs.

The difference is that playback requirements are not as difficult as recording requirements. In particular, the required SNR of a playback system is lower than that of a recording system. I think 48 kHz is fine for playback.

The problem is these stupid 192 kHz systems have backers with big names who never bothered to do proper double-blind tests to figure out if the difference is actually perceptible. You can even get a 384 kHz system these days, which would be overengineered for dogs and is more than good enough for bats.

192 kHz is a "bigger number = better" marketing ploy imo. Dont get me started on interpolating "240hz" TV's.

The move from 16 bit to 24, however, is significant. And not only to headroom.

As far as SR systems, I really like how 96 can sound. And I'm more comfortable intuitively with considering the limit of human hearing somewhere closer to 24 or 25k than 22.

If I understand you correctly, going from 44.1kHz to 48kHz would be worth it on the playback side of things?

That wouldn’t seem like it would be all that hard to do. CDs are a legacy format now and AAC files don’t care about the sampling rate (if you don’t want to go beyond 96kHz).

What’s stopping that? Have all the audio engineers pipelines that are only capably of outputting 44.1kHz? (I imagine someone at Sony Music sitting in a dark room and ripping CDs all day – probably not true but a funny enough picture.)

Then again, after doing a blind test (256kbps AAC, CD) and being unable to tell the difference (yeah, I know, that’s not the same as a difference in sampling rate) I’m skeptical of all supposed small improvements in audio quality on the playback side.

Last year I bought a 14 input, 24 bit, 44.1/48/88.2/96kHz audio interface for $100. Granted, it's not supposed to cost so little, no one cares about MSRP but it runs twice that on Amazon, more at Guitar Center, and typically no less than $170 on eBay. I just happened to get a good deal on Craigslist.

Anyway, the issue isn't one of hardware limitations; Call it old guys fearing technology, call it the Red Book cartel, call it whatever you want, it's inertia.

It's also worth pointing out that for the last 20 years, most audio converters have been of the "oversampling" type, one of the main advantages being that a much gentler anti-aliasing can be used, with fewer audible artefacts down into the audible frequency range.
Hence the disclaimer :) thanks for the better numbers; I didn't realize such high-order filters were common in consumer equipment.
Partly yes, but also because of the data requirements of the format (CD). Bit depth is important too because it defines the number of locus points available to you when measuring a wave, so the more bit depth you get the closer the digital representation of the wave becomes to the original analogue source.

This is a point that often gets lost in these discussions: the whole point of digital audio is mean't to be a more efficient capturing and distribution of analogue source, as such the more information you can capture about each instance in time the closer your digital reproduction can get to the original analogue wave.