Hacker News new | ask | show | jobs
by f5ve 837 days ago
If you're thinking "The highest rate I need my signal to be able to replicate is X, so I should set my sampling rate to 2X," then you're wrong and this article gives several reasons why.

As far as I can tell, though, it doesn't mention what may be the most important reason (especially to the folks here at hackernews): resampling and processing.

This is why professional grade audio processing operates at a sample rate many multiples higher than human hearing. It's not because of the quality difference between, say, 192 and 96 kHz, but rather if you're resampling or iterating a process dozens of times at those rates, eventually artifacts will form and make their way into the range of human hearing (20 kHz).

7 comments

You’re right, but I fear this idea has become prevalent in audiophile communities where they only want to listen to files that are 96kHz or higher.

In my opinion, having a high sample rate only really matters during the production phase and does not have a noticeable effect on the final form factor. If the producer uses high sample rate during the creation process, I see no reason why the listener would care if the file they’re listening to is higher than even 44.1kHz unless they are planning on using it for their own production.

People should prefer 48k over 44.1 but not for fidelity. It would just make the world a better place if 44.1k audio files died out. The reasons it was chosen are invalid today and we're stuck with it, and now every audio stack needs to be able to convert between 44.1/88.2 and 48/96 which is a solved problem, but has a tradeoff between fidelity and performance that makes resampling algorithms a critical design feature of those stacks.

All because Sony and Philips wanted 80 minutes of stereo audio on CDs decades ago.

https://en.wikipedia.org/wiki/PCM_adaptor

It's very likely that the 44.1 kHz rate comes from the PCM adaptors that were designed to take PCM audio and convert it to something that a video tape recorder would accept.

I watched a YouTube a few months ago about these adaptors and the presenter did the calculations showing how the 44.1 kHz 16-bit sample rate lines up with the video fields. There was a valid engineering reason for this sampling rate.

However, the stories about one of the Sony executives having a particular piece of music in mind are true, and have to do with the diameter of the disk being enlarged compared to what Philips originally had in mind. By that time the bitrate was already decided.

I still agree that 48 kHz is a better choice today, especially after reading this paper.

Beethoven's 9th.

> Kees Immink, Philips' chief engineer, who developed the CD, recalls that a commercial tug-of-war between the development partners, Sony and Philips, led to a settlement in a neutral 12-cm diameter format. The 1951 performance of the Ninth Symphony conducted by Furtwängler was brought forward as the perfect excuse for the change,[76][77] and was put forth in a Philips news release celebrating the 25th anniversary of the Compact Disc as the reason for the 74-minute length.

https://en.wikipedia.org/wiki/Symphony_No._9_(Beethoven)#Com...

What _is_ the reason people should prefer 48k over 44.1k though?
To avoid the required non-integer resampling in software, as everything but music has basically standardized on 48k, and most platforms default to it.
All TV and computer audio runs at it, raise for TV/Film purposes 48000 is a very nice round number.
While audio equipment and algorithms don't care about nice-looking numbers, I think the actually useful property is that 48000 has more favorable prime factors 44100 which can be a useful property for resampling and other tasks.
The same could be said about bit depth: 24 bits offers far less quantization artifacts than 16 bits, and those artifacts can readily show up during production processes such as dynamic range compression, but they are extremely well hidden by dithering with noise shaping which gets applied during mastering so ultimately listeners are fine either way.

However, any type of subsequent processing in the digital domain, even just a volume change by the listener if it's applied digitally in the 16 bit realm (i.e., without first upscaling to 24 bits), completely destroys the benefit of dithering. For that reason, we might say that additional processing isn't confined to the recording studio and can happen at the end user level.

I'm unsure whether this same logic applies to sampling frequency, but probably? I guess post-mastering processing of amplitude is far more common than time-based changes, but maybe DJs doing beat matching?

I detect some fallacy here.

The real benefit is not using 6x network bandwidth, storage, memory, processing power and more battery of the mobile device. That benefit is not going anywhere, no matter what.

Post-processing is applied to the signal which is physically impossible to distinguish from the source. It is true that it often needs higher resolution, and DSPs will upsample internally and then back and operate on floats. But to claim without evidence, that post-processing may give human listener back the ability to tell apart whether 192/24 medium was used instead of 48/16, would be to reintroduce the same quality-loss paranoia, just with an extra step. If one couldn't hear the difference before an effect was applied...they won't hear it after.

As for DJs, they do use high-res assets when producing mixes. That's still mastering stage, technically.

With music, in particular, if you use any analog sources while recording, the signal will contain so much noise that any dithering signal will be far below the floor and will most likely be completely redundant. I know that people claim to hear a difference, but they also claim to hear a difference between gold and copper contacts.
I hear no difference between undithered 16 bit and anything "better" (e.g. dithered 16 bit, or more bits) and anyone who claims they do should be highly scrutinized, when we're talking about a system (media, DAC, amplification, transducer, human) playing a mastered recording at a moderate volume setting. But I certainly hear the difference (as quantization artifacts) when cranking the volume up to extremely high levels when the source material is extremely quiet, like during a fade out, a reverb tail, or just something not properly mastered to use the full range; setting the volume to something that would totally clip the amp, blow the speakers, or deafen me if it weren't a very quiet part of the recording.

Dithering (or more bits) does solve for this. A fade out of the song also lowers the captured noise floor, but the dither function keeps going.

It's akin to noticing occasional posterization (banding) in very dark scenes if your TV isn't totally crushing the blacks. With a higher than recommended black level, you will see this artifact, because perceptual video codecs destroy (for efficiency purposes) the visual dither that would otherwise soften the bands of dark color into a nice grainy halftone sort of thing which would be much less offensive.

Bit depth is only useful at reducing the noise floor, the lower the bit rate the higher the noise.

That’s why producers (mixing many tracks in a session) want to use high bit rate stems, because they are summing the noise from n tracks.

It’s a pointless exercise for DJs or anyone listening to a single source to use a higher bit depth.

> even just a volume change by the listener if it's applied digitally in the 16 bit realm ...

I think that "if" is doing a heavy work here.

Maybe it's an uncommon scenario these days, but not too terribly long ago I think it was fairly typical for software audio players to be 16-bit and offer volume controls, and anything other than 100% would completely ruin most benefits of dither.
Not just eventually: many effects, such as basically any non-linear mapping like a distortion, will create overtones that will immediately alias down if you are not oversampling. You either need to use some DSP tricks or oversample (usually a mix of both) to avoid this happening, which often happens in just one step of an effects chain.
A great explanation by Dan Worrall:

https://www.youtube.com/watch?v=-jCwIsT0X8M

Even the term "oversampling" implies that sampling beyond Nyquist rate is excessive. I think you would agree that one is not being excessive. It is necessary to sample well beyond accepted "Nyquist rate" in order to reconstruct the signal.
I'd phrase it differently.

Your signal contains all kinds of frequencies: Those you care about and those you don't want in your recording. You can't just sample at the Nyquist rate of the interesting frequency and expect all the other frequencies to vanish. They will mess with the frequencies you are actually interested in.

That is the term, however. You see it in many contexts where a higher sample rate is traded for some other desirable attribute. (For example, it's often desirable for an ADC to sample faster than the higher frequency content you care about in an analog signal, for the reasons detailed in the paper as well as because it can give you a lower noise ADC. delta-sigma converters being an extreme case of this, helped by a seperate trick of noise shaping).

It's worth noting it's a tradeoff, even in pure processing: almost all non-linear transfer functions will create an infinite number of overtones, so it's impossible to avoid aliasing completely: you can only reduce them to some threshold which is acceptable to the application.

I think you're mixing up the effects of _sample rate_ and _bit depth_ here!

Everything you said about sample rate applies more to bit depth. Higher bit depth (bits per sample) results in a lower noise floor. When audio is digitally processed or resampled, a small amount of noise ("quantization distortion") is added, which accumulates with further processing. This can be mitigated by working at higher bit depths - which is why professional grade audio processing routinely uses 24 bit formats (for storage) and 32-bit or 64-bit floating point internally (for processing), even if the final delivery format is only 16 bit.

Sample rate, on the other hand, affects bandwidth. A higher sample rate recording will contain higher frequencies. It doesn't have any direct effect on the noise floor or level of distortion introduced by resampling, as I understand. (It could have an indirect effect - for example, if certain hardware or plugins work better at particular sample rates.)

A survey of ~2,000 professional audio engineers done in May 2023 showed that 75% of those working in music use 41.1 kHz or 48 kHz, while 93% of those working in post production use 41.1 kHz or 48 kHz.[1] These are the basic CD-derived and video-derived sample rate standards.

From this it's clear that even in professional audio, higher sample rates are a minority pursuit. Furthermore, the differences are extremely subjective. Some audio engineers swear by higher sample rates, while others say it's a waste of time unless you're recording for bats. It's very rare (and practically, quite difficult) to do proper tests to eliminate confirmation bias.

[1] https://www.production-expert.com/production-expert-1/sample...

EDIT: add link to survey.

Yeah, a lot of people think “Nyquist” is a synonym for 2 and stop thinking further.
Heh. Then they don't actually understand what it implies.

Which makes sense I suppose.

Also, when the sampling rates get extreme (software-defined radio), it is well worth moving to complex samples. Doing so allows you to use a sampling rate equal to your theoretical maximum bandwidth, instead of 2x. That's not such a big deal at audio bandwidth, but when your Airspy is slinging a 6 MHz chunk of spectrum, it becomes a huge deal.
Another factor which I don't see mentioned is that the tech audio signal is not always directly going to your ear. Is it possible for the sound bouncing around the room to break such assumptions?
This is all about representing signals inside a computer. Audio played from a speaker (or as it exists in the physical domain) is continuous and your ear doesn't have a sample rate. So there's no concept of a Nyquist limit or aliasing with physical sound.
Higher sampling rate makes it easier to identify non-sound disturbances. Like vibrations or electrical, that can show up in multiple orders of some frequency.