I think there's some subtleties that are taken for granted with a face-to-face conversation that simply become awkward when engaging in a remote conversion with latency.
With face-to-face I can avoid a majority of the interruptions because I can notice a slight change of a facial expression, e.g. a mouth slightly opening, eyes lighting up, that alludes to them beginning to speak.
But with a remote conversation, that noticing of expression suffers from the same latency issues as audio, so I'm reading the emotional cue way too late.
I usually don’t have to throttle my thoughts with video conferencing unless there are more than 1 or 2 other people on the conference. But when there are more than 3 people in a conference room, I have to throttle my thoughts there as well.
With face-to-face I can avoid a majority of the interruptions because I can notice a slight change of a facial expression, e.g. a mouth slightly opening, eyes lighting up, that alludes to them beginning to speak.
But with a remote conversation, that noticing of expression suffers from the same latency issues as audio, so I'm reading the emotional cue way too late.