| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by davitb 2777 days ago

Disclosure: I'm the author of the blog post and co-founder at 2Hz.

This is a guest post on NVIDIA Developer Blog. The author of the technology is a startup called 2Hz (2hz.ai). Our passion is to improve voice audio quality in audio/video calls. It's a tough problem but also fun to work on.

Agree, breathing, reverb, noise are all problems and should be fixed. We started with noise and already shipped a product you can try on your Mac. The app is called Krisp (krisp.ai).

Reverb, breathing, voice cutting will come next.

5 comments

hathawsh 2777 days ago

Hi! As someone who seems to struggle more than most to understand people on video calls, I'd like to give you my impressions.

Something struck me about the sample video. The very first sample included background noise, but it was very easy to understand regardless of the noise, probably because it was recorded by a pro microphone rather than a phone. Every other sample was far more difficult, regardless of noise removal. Noise removal doesn't really seem to help; in fact, any imperfections in the noise removal process actually make the audio more difficult to understand because I have to guess not only the speaker's voice and the noise but also the algorithm for noise removal.

What does help me is low frequency pickup. I think the first sample is easy because there are plenty of low frequency components that are later lost through the phone.

Low frequencies are presumably difficult to pick up due to the size of the microphone in a phone, but could there be a way to restore those frequencies through audio processing? It would be interesting to analyze the response of specific microphones to specific low frequencies and find patterns that an audio processor could use to restore the low frequency components.

Anyway, kudos for doing some very interesting work. I don't know how representative my experience is.

link

matrix 2777 days ago

In my experience it's the loss (or masking) of high frequencies that are the most problematic for understanding speech. The most important sounds in speech are consonants, which are higher frequency sounds. Combine this with foreign accents, and more often than not conference calls quickly degenerate into an unintelligible babble (for me, at least).

link

CharlesW 2777 days ago

> I don't know how representative my experience is.

As someone who works with speech content, this seems unusual. Typically, low frequencies are reduced because there's not much useful voice signal there—for example, NPR typically rolls off frequencies below 250 Hz.

link

hathawsh 2777 days ago

Thanks for your viewpoint!

Here's something concrete: the first phrase in the video ends with "small demonstration", but starting with the second instance, I distinctly hear "sall" instead of "small". In the version with the noise, the "m" sounds like an aberration of the noise and is detectable. With the noise removed, the "m" is replaced with a blip that sounds like an encoding error.

link

samstave 2777 days ago

Hi, don't know if you're taking unsolicited requests:

But here are some toggle options I would want a system like this to do (enabled by default):

* Do not send whispers. If I am a primary speaker, and I switch to address someone local to my side of the call via a whisper, that audio should be effectively muted to the other side.

* Focus muting. If I look away from the screen and begin addressing someone off camera, away from the mic, mute that as well.

* Bark and siren filtering. Specifically able to ID and mute barks and sirens. (Planes, motorcycles and trucks would be awesome)

---

What is your imoressikn of the company and app Temi?

link

nyxtom 2777 days ago

There is a massive market for improving audio quality especially for professional gaming. This would be a huge leap forward.

link

konschubert 2777 days ago

Sorry, I noticed that too late. I'm currently reading the paper linked in the blog post. Are there other resources on the topic that you can recommend?

link

antman 2777 days ago

Impressive results! Any plans for a windows/linux version? Does it also cancel noise on incoming audio?

link

davitb 2776 days ago

Windows will come soon. Linux - no plans yet. Yes, it cleans the incoming as well.

link