Hacker News new | ask | show | jobs
by ZoomZoomZoom 1961 days ago
Sound engineer here.

RNNoise is an amazing feat, but please, don't overdo it. Most of the time, you don't really want complete ambient noise elimination, as human speech appearing from dead silence sounds unnatural. Moreover, most noise reduction software is considerably less effective in reducing noise during a person speaking, either removing too much, producing degraded speech sound (worst case) or too little. If it's possible, always start adding your noise reduction gradually, stop when it sounds good to your ear and then back up a bit.

If you're doing voice recording/streaming, please, get to know Expanding and Compression first, and only after configuring your sound processing chain add noise reduction in.

On of the serious offenders is OBS studio, which recently added RNNoise filter, but provides no means of mixing processed sound with the dry one (in other words, filter is always 100% on). Wet/Dry mix knob is heavily needed for most filters there.

I'm very saddened by the state of sound quality in lots of amazing videos people have been producing lately and now I'm considering writing a guide for voice processing for streams/conferences/etc for the techy people, if anyone's interested.

7 comments

Great post.

I'm also an audio engineer. This is the truth.

In an audio recording featuring spoken voice, there are two sounds present in every recording: the spoken voice, and the room ambiance in the background. We typically will refer to the latter as "room tone."

Even though we don't usually explicitly realize this, our ears/brain implicitly do. So, when people overdo noise removal, we implicitly hear the difference since half of the sounds that compose your filtered output are now gone. We tend to associate such recognizable "noise gating" with lower production quality and we find that generally such processing leads to lower intelligibility of the human voice.

The addition of an artificial ambient background is known as "comfort noise" for those who are interested to look further into it; usually it's done on the receiver end.
I'd be quite interested in such an article, again, my goal (besides VoIP) is screencasting and/or streaming, so any bit of advice someone with experience might have is greatly useful.

I'll look into expansion and compression, and I could implement a wet/dry setting that multiplies the source samples and then mixes them into the result, if I understood the concept right.

EDIT: RNNoise seems to be alright when it comes to canceling noise during speech too, I didn't notice it overdoing it.

> I could implement a wet/dry setting that multiplies the source samples and then mixes them into the result, if I understood the concept right.

Haven't tested your version yet, but werman/noise-suppression-for-voice plugin introduces some delay and dumb wet/dry control (or mixing with original sound source in some other way) doesn't work, so it might turn out to be not so simple.

Right now there's no such feature in place, but I imagine keeping the buffer from before denoising and mixing it into the denoised result (plus the multiplication) will do what you're describing? It may increase volume, I might need to reduce the volume of the denoised audio first. I'll play around with it, and am open to hearing what you've got to say about it.
I wouldn't be too worried about it unless you're working on something at the level you know why to be worried about it (i.e. you're mixing audio as part of the what you're doing not because you just need the audio output to work). For instance I'd take missing comfort noise 10 times before everyone hearing my water heater kick up once on a conference call or while playing a team shooter.

That being said RNNoise isn't that great at actually filtering background noise as much as guessing when to drop the levels and as you mention it really doesn't block much when it detects you're speaking rather just lets most everything through until you stop.

RTX voice made the gold standard in filtering IMO though and as amazing a feat RNNoise is (I certainly couldn't do better) it's just not that good in comparison. I'm not sure what they did to make their model so good but I can use a boom mic set to omni, run a fan at high speed into the mic, bang on the desk repeatedly with one hand, have the water heater making noise, my phone vibrating on the table, a car alarm going in the background, the cat scratching a post, and so on and as long as I remember to talk at a normal volume it's damn near indistinguishable from talking in a quiet room. It may sound preposterous or like I'm exaggerating for effect but I'll be damned it actually filters that well. I didn't believe it until I tried. It finally gets "bad" when the noise is so bad and loud on the microphone your voice starts to sound a bit distorted but it's still isolated. Does let cat meows through, though that is technically voice and I'm not sure how you could identify it was a meow without massive latency to hear the whole thing first.

That being said they seem to have completely fucked something up porting it to Nvidia Broadcast as the mic filtering in that leaks to the point it was like it wasn't even on.

UPD: I've written the guide. *Voice recording and processing for talks, streaming and conferencing. The Reference.*

I'm not so good with short names and the post itself is pretty long.

Here's the link: https://indiscipline.github.io/post/voice-sound-reference/

I think pretty much everyone who does A/V production (and some people who don't, like me) would be interested in such a guide. Please do write it!
Your guide would be a blessing for techies looking to improve their audio quality. Please, do it!
>ng feat, but please, don't overdo it. Most of the time, you don't really want complete ambient noise elimination, as human speech appearing from dead silence sounds unnatural.

No. Most sane programs don't do comfort noise because it is everything but comfort. Iff you speak data should be transmitted.