Hacker News new | ask | show | jobs
by konschubert 2777 days ago
I am really impressed with what Nvidia is doing here.

I think there is a huge market for improving sound quality in video calls.

For me, roughly every second call I make is somehow harmed by some kind of "bad audio" problems. Breathing, reverb, noise, clipping, too silent, there are so many things that can go wrong.

And this really harms the productivity of video calls.

I have started collecting and building tools to detect all of these sources of bad audio and am collecting them at https://www.tinydrop.io Maybe these APIs can help people to improve their setup. But if software like Nvidia's comes along and just fixes the problem once and for all - that's great as well!

5 comments

Disclosure: I'm the author of the blog post and co-founder at 2Hz.

This is a guest post on NVIDIA Developer Blog. The author of the technology is a startup called 2Hz (2hz.ai). Our passion is to improve voice audio quality in audio/video calls. It's a tough problem but also fun to work on.

Agree, breathing, reverb, noise are all problems and should be fixed. We started with noise and already shipped a product you can try on your Mac. The app is called Krisp (krisp.ai).

Reverb, breathing, voice cutting will come next.

Hi! As someone who seems to struggle more than most to understand people on video calls, I'd like to give you my impressions.

Something struck me about the sample video. The very first sample included background noise, but it was very easy to understand regardless of the noise, probably because it was recorded by a pro microphone rather than a phone. Every other sample was far more difficult, regardless of noise removal. Noise removal doesn't really seem to help; in fact, any imperfections in the noise removal process actually make the audio more difficult to understand because I have to guess not only the speaker's voice and the noise but also the algorithm for noise removal.

What does help me is low frequency pickup. I think the first sample is easy because there are plenty of low frequency components that are later lost through the phone.

Low frequencies are presumably difficult to pick up due to the size of the microphone in a phone, but could there be a way to restore those frequencies through audio processing? It would be interesting to analyze the response of specific microphones to specific low frequencies and find patterns that an audio processor could use to restore the low frequency components.

Anyway, kudos for doing some very interesting work. I don't know how representative my experience is.

In my experience it's the loss (or masking) of high frequencies that are the most problematic for understanding speech. The most important sounds in speech are consonants, which are higher frequency sounds. Combine this with foreign accents, and more often than not conference calls quickly degenerate into an unintelligible babble (for me, at least).
> I don't know how representative my experience is.

As someone who works with speech content, this seems unusual. Typically, low frequencies are reduced because there's not much useful voice signal there—for example, NPR typically rolls off frequencies below 250 Hz.

Thanks for your viewpoint!

Here's something concrete: the first phrase in the video ends with "small demonstration", but starting with the second instance, I distinctly hear "sall" instead of "small". In the version with the noise, the "m" sounds like an aberration of the noise and is detectable. With the noise removed, the "m" is replaced with a blip that sounds like an encoding error.

Hi, don't know if you're taking unsolicited requests:

But here are some toggle options I would want a system like this to do (enabled by default):

* Do not send whispers. If I am a primary speaker, and I switch to address someone local to my side of the call via a whisper, that audio should be effectively muted to the other side.

* Focus muting. If I look away from the screen and begin addressing someone off camera, away from the mic, mute that as well.

* Bark and siren filtering. Specifically able to ID and mute barks and sirens. (Planes, motorcycles and trucks would be awesome)

---

What is your imoressikn of the company and app Temi?

There is a massive market for improving audio quality especially for professional gaming. This would be a huge leap forward.
Sorry, I noticed that too late. I'm currently reading the paper linked in the blog post. Are there other resources on the topic that you can recommend?
Impressive results! Any plans for a windows/linux version? Does it also cancel noise on incoming audio?
Windows will come soon. Linux - no plans yet. Yes, it cleans the incoming as well.
Ironically I've had the audio cleanup filters get in the way.

I was trying to make a program which would FFT sounds from my mic and trigger on certain frequencies or combinations of frequencies. The ideas was to have audio files on my phone to act as a sort of poor mans remote control. Yes I know there are wifi and bluetooth ways to do it but I wanted to experiment a bit with sound.

Anyway I'm pissing around for hours with packages and settings and I can't get the damn thing to work. Turns out my computer came with some super fancy beats audio^tm sound system which actively suppresses a microphone input of constant frequency under the assumption its an unwanted buzzing sound.

All features are someone else's bug.

And don't forget hearing aids. That's a market that's only going to keep getting bigger over time. The first people to ship a super-power-efficient ASIC for wind/restaurant denoising which allows reasonable hearing aid battery life are going to make a well-deserved fortune.
While not nearly as impressive, Chrome's WebRTC AEC stuff (also available via pulseaudio [1]) works pretty well.

[1] - pactl load-module module-echo-cancel aec_method=webrtc

Some unsolicited feedback. when you list your features could you list the text under the image ?. its a bit tedious to click on each image to get a quick idea about the product. the text is worth a lot more than the picture but is completely hidden.
Hey, yea, that this is a terrible experience currently and I am planning to fix it of course. I'm simply still using the stock images that came with the theme...