Hacker News new | ask | show | jobs
by jewel 1440 days ago
There's another approach that might work better.

Instead of sampling the mouse position every 100ms, you'd save off all the mouse positions, and then send the latest batch every 100ms. The other side would then replay the exact positions, just delayed by 100ms. It'll end up with the same latency as these motion smoothing approaches, while only using slightly more bandwidth.

7 comments

Yeah, wanted to say the same thing. If latency is the only problem, just batch up the updates and replay on the other end. If bandwidth also becomes a problem, only then start compressing the data. But to be honest, if we can stream video over the internet, we surely have enough bandwidth to stream cursor positions.
Good point, it's also important to consider this in the context of the application that will use those multiplayer cursors. It's important that the state of the document matches the cursors, so it makes sense to have both presence (cursors, selection) and storage (document data) be perfectly in sync even if that means having a slight 80-200ms delay.
I think this is one of those situations where “just because we can doesn’t mean we should” applies.

The marginal benefit of exact cursor positions is so low that sending all that data still feels like a waste.

Could also have a positive impact on battery life as it would let the WiFi/cellular radio sleep longer.
What about a mix:

Every 100ms you fit the best Beizer curve to the last batch of mouse positions and send that.

It seems like that would give a more precise reconstruction than fitting a Beizer curve only to one sample every 100ms on the server.

Agreed. This would be only slightly annoying if you're talking to the person. But 100ms or even 200ms is an acceptable latency, especially if it's constant. This solution is not just simpler, but it's also more efficient as you can bundle the data up efficiently, and include state changes as well.

There is a lot of prior art in this space, btw. I remember Meteor.js having a great real-time demo over websockets that actually used predictive techniques to keep things (imperfectly, but still impressively) 0 latency.

Honestly, I don't see why you'd need to batch it ever 100ms. Sure, you don't want to send a mouse movement every time an event triggers, but surely 30fps looks smooth and won't overload the system.
You can't reliably send, fail, retry, and confirm receipt of a TCP packet in 33ms over arbitrary Internet connections. 8ms is right out. I can get 200us over a local EtherCAT realtime industrial IO network, but that's with careful management of well-isolated single-machine network conditions and it just doesn't work with a cellular modem for download and oversubscribed residential cable for upload. Assuming latency of 120ms (as used in the defaults in the linked tutorial) is much more realistic.

And you also can't set up your system with a 5 second delay to send data every 5 seconds, because any jitter will result in hiccups.

You could set up your system to send out the new data each time the previous buffer is acknowledged, but that's kind of pointless, if you get lucky with a good connection and can send data to be rendered 4.970 to 5.000 seconds from now, what's the difference for the user between doing that versus reducing the network load by approximately a factor of 3 and waiting until you have data for 4.900 to 5.000 seconds?

I think 100ms is a reasonable minimum batch size.

The biggest problem is TCP. Ping times for multiplayer games on decent Internet connections can be on the order of 10-20ms or even lower.
You're confusing latency with throughput.

I don't need to reliably send the TCP packets in 33 ms. I just need to be able to start sending a packet every 33ms. Assuming sending, acking, etc is non blocking and uses few enough resources asynchronously it's fine.

What we need is UDP websockets! Isn’t that what WebRTC is though?
Just for comparison, the default USB mouse polling rate is 125 Hz, so 8 ms. If that is too often, 16 or 32 ms would make sense, which is close to 60 or 30 Hz/fps, respectively.
Am I missing how that would work? That's an exponential increase in bandwidth needed to send all the mouse positions. 1 position per 100ms per player vs 10-30 per 100 ms per player. Those positions all have to be propagated to other players. so in the first case, 10 players = 10 positions per 100ms. In the 2nd case is 100-300 positions per 100ms.
Yeah but stop and think about how little bandwidth it still is.

X and Y can easily be 2 bytes each, 4 bytes total. 100 samples per second is a mere 400 bytes per second. You could do it from a dialup modem from the early 90s!

You could, but I think for most web applications the authors wouldn’t think about binary encoding. So you’d end up with something like:

  {“x”:50,”y”:56}
Encoded as a UTF8 string, which is 15 bytes, x100 is 1.5kb/second/participant.

Ok, I guess that’s still not that much.

So you implement spline logic rather than a byte stream? Doesn't make sense to me.
That’ll get compressed a bit too.
Well spotted, and in reality it would more likely be 4096x4096 plane, encoded as 12 bits + 12 bits = 3 bytes. And probably 30 FPS giving 90 B/s. So the only problem is how often you want to send those packets, but the bandwidth of the cursor data becomes completely irrelevant.
60 FPS = 60 F/s = 6 F/100 ms - so where did you get 10-30? That would mean 90-270 FPS, you don't need that much for a cursor.
I like this solution but would latency create issues? You'd only be sending 100ms worth of motion every ~120ms. Would you just drift 200ms behind with every passing second (20ms behind after each 100ms batch)? I think I may be missing something though.
You actually intentionally start with more playback delay with this method and sync playback with the other user intentionally 300ms behind (assuming a 100ms latency)

So the remote user packets up 100ms of mouse movement with timestamps, sends it with ~100ms latency. Your side now has a buffer of ~100ms to start playing the positions back.

This also removes all jitter in the playback from varying latency (up to the point the jitter stays under 100ms).

All of the above numbers are made up for this example. You can adjust the playback delay as much as needed for smooth playback.

Thanks, that makes sense!
You've added a crap ton of data though. The point is to show someone's movement, not every pixel the hovered.

Sampling X amount per second is enough, and send every clicking position in between as those are valuable information. The rest is noise.