Hacker News new | ask | show | jobs
by ssalazar 3624 days ago
I agree but

> voice recognition needs the full spectrum for accuracy

do you have a source for this? Voice signals are conventionally low-bandwidth; 16kHz is usually "good enough" for human-human transmission. Formant frequencies top out around 3kHz [1] and upper vocal harmonics are not really important outside musical applications. Consonants are a bit more complicated but I'd be interested to know what voice information is present above 20kHz.

[1] https://en.wikipedia.org/wiki/Formant#Formants_and_phonetics

1 comments

yeah, i can't hear anything past 13.5kHz and can understand speech just fine. can't imagine why a computer couldn't.
Opus works great for speech recognition but I wanted to point out how your argument doesn't support the conclusion logically.

Lets imagine that human speech had a nearly unique property of having another whole copy of the speech in the form of ultrasonic overtones at 10x the normal frequency at a loud volume.

You couldn't hear them and yet you hear speech fine. But a computer could make good use of the ultrasound portion-- and maybe understand speech much better than you as a result.

This isn't how it works in reality, but it does show a flaw in your logic.

The argument is that if a human brain can recognise speech accurately without needing your hypotehtical ultrasonic overtones, why would a computer need them? Not to mention that most mid-range microphones won't pick up such overtones anyway. There isn't a flaw in their logic, you're just arguing that there might be more information that a computer can use -- but the fact that we don't need it leads to the conclusion that a computer doesn't need it either.