Hacker News new | ask | show | jobs
by GuiA 2197 days ago
There are hard limits at play. No matter what you do, you can't go from New York to London in less than ~20ms; add video/audio encoding, packet switching, decoding, etc. and it's easy to see why any latency under the 100ms mark at that spatial scale in a scalable, mainstream product would be close to a miracle.

The thing is that when we talk in a room, sound will take <10ms to reach my ears from your mouth. This is what "enables" all of the human turn taking cues in conversation (eye contact, picking up whether a sentence is about to end/whether it's a good time to chime in/etc) - I've been looking for work from people who've tried to see at what point things start feeling really bad (is it 10ms, or 50ms?), but haven't found much so far. No matter what it is though, it's likely that long distance digital communications just cannot match it.

See also this interesting comment about the feeling of "closeness" from phone copper wires:

https://news.ycombinator.com/item?id=22931809

Landlines were so fast and so "direct" in their latency (where distance correlates very directly with time, due to a lack of "hops") that local phone calls were faster than the speed of sound across a table, and for a bit after they came out--before people generally got used to seemingly random latency--local calls felt "intimate", like as if you were talking to someone in bed with their head right next to you; I also have heard stories of negotiators who had gotten really tuned to analyzing people's wait times while thinking that long distance calls were confusing and threw them off their game.

4 comments

> it's easy to see why any latency under the 100ms mark at that spatial scale in a scalable, mainstream product would be close to a miracle.

It seems normal phones are able to do it, though. At least it seems normal phones suffer less latency problem.

In a way, simplicity in technology often means better performance.

Linux is ill-suited for realtime applications.

Google is well-aware of this, thus Fuchsia.

SeL4 would make a good base for such a device.

The media lab has done a ton of research on this. I seem to remember people being able to notice visual latency at 30ms and audio latency at 80-120ms (this is because light is faster than sound).
>and audio latency at 80-120ms

Any rhythm game player will disagree.

Some games (e.g. llsif, for android) have "perfect" window sized to 16ms (a video frame). Even with latency compensation, these are unplayable on bluetooth yet fine on headphone jack. As the game has calibration, the resulting offset is seen to be at least 30ms worse on bluetooth.

Interesting, would love to read more if specific papers/authors come to your mind. I suspect there's a big gap between e.g. "noticing the audio latency when audio is played as a result of pressing a button" vs "audio latency affecting the flow of a multiparty conversation".
it's probably the latter, because the former is about 5ms (which is equivalent to the statement, "how short of a time between sounds are they perceivable as separate" aka the lower frequency threshold of hearing). It's non obvious that they're the same limit.
> The thing is that when we talk in a room, sound will take <10ms to reach my ears from your mouth. This is what "enables" all of the human turn taking cues in conversation (eye contact, picking up whether a sentence is about to end/whether it's a good time to chime in/etc) - I've been looking for work from people who've tried to see at what point things start feeling really bad (is it 10ms, or 50ms?), but haven't found much so far. No matter what it is though, it's likely that long distance digital communications just cannot match it.

Digital communication could cheat, though!

There's a lot of latency hiding you can do, if you can predict well enough what's coming next. Humans are fairly predictable most of the time.

Where does Tonari actually put the camera? The perspective on the displayed image makes it look like the camera is ceiling mounted, but that would make the eye contact problem much worse than even Zoom.