|
|
|
|
|
by modeless
38 days ago
|
|
> Humans will naturally prefer the auditory experience of an occasional dropped packet, vs backed up audio or audio that plays at an uneven rate Yes but the difference here is there is only one human in the conversation. The other side can tolerate a 200ms delay in receiving or sending perfectly fine because it is not constrained to run in exactly real time like a human brain is. I think he is right. This is an interesting point that I haven't considered before. The reason we skip 200ms instead of pausing for 200ms when we get missed packets in a WebRTC call is because we can't pause the human on the other side of the call. But we can pause AI just fine. |
|
This isn't about pausing anyone; it's about doing faster-than-realtime processing after a delay event. Humans can do that to some extent, and this is in fact done with some voice applications like Microsoft Teams, where after a network interruption the audio is sometimes played back really fast until the point that it becomes real-time again.
I hope it's an intentional design decision, because it works really well (for me). I can often perfectly keep track of a conversation in spite of the network delay. As much as I hate Teams, its meetings and voice implementation (also noise cancellation) works quite well, especially compared to current open source solutions like Jitsi or BigBlueButton.