Hacker News new | ask | show | jobs
by jws 2074 days ago
That takes care of half of the problem.

You will be a complete hero if you write up how to do "blind signal separation" with an economical microphone array. Such a device can be ceiling mounted and use copious heaps of mathematics to locate speaking people and isolate just their voice, attenuating other room noise and echos.

There are likely affordable hardware candidates, the $55 ESP32-Lyra-TD development board with 3 microphones or the MATRIX Voice with 8 at $75. Both have special hardware for the signal processing.

1 comments

You will almost certainly need a neural accelerator for this, neural networks are currently the only known effective solution for cocktail party problems.
If you have many microphones then you are probably just beamforming, which is pure math.
Some of these neural network things can track the speaker's face. So not necessary, but it might might make things more interesting.
How did the world arrive to this, that we think that NNs solve everything? Beamforming requires zero NNs and works wonderfully (it is all math).