|
The way this works (and I'm obviously taking a high level view here) is by comparing what is being played to what is being captured. There is an inherent latency in between what is called the capture stream (the mic) and the reverse stream (what is being output to the speakers, be it people taking or music or whatever), and by finding this latency and comparing, one can cancel the music from the speech captured. Within a single process, or tree of processes that can cooperate, this is straightforward (modulo the actual audio signal processing which isn't) to do: keep what you're playing for a few hundreds milliseconds around, compare to what you're getting in the microphone, find correlations, cancel. If the process aren't related there are multiple ways to do this. Either the OS provides a capture API that does the cancellation, this is what happens e.g. on macOS for Firefox and Safari, you can use this. The OS knows what is being output. This is often available on mobile as well. Sometimes (Linux desktop, Windows) the OS provides a loopback stream: a way to capture the audio that is being played back, and that can similarly be used for cancellation. If none of this is available, you mix the audio output and perform cancellation yourself, and the behaviour your observe happens. Source: I do that, but at Mozilla and we unsurprisingly have the same problems and solutions. |
>The missile knows where it is at all times. It knows this because it knows where it isn't. By subtracting where it is from where it isn't, or where it isn't from where it is (whichever is greater), it obtains a difference, or deviation
https://knowyourmeme.com/memes/the-missile-knows-where-it-is