Hacker News new | ask | show | jobs
by hpx7 1477 days ago
I've been working on my own realtime networking engine[0] and I think there are a few important points related to network syncing that are not mentioned in this article:

1) Bandwidth. The users internet can only handle so much network throughput, so for fast paced games (where you're sending data to each client at a rate of 20+ frames per second) it becomes important to optimize your per-frame packet size. This means using techniques like binary encoding and delta compression (only send diffs).

2) Server infrastructure. For client-server games, latency is going to be a function of server placement. If you only have a single server that is deployed in us-east and a bunch of users want to play with each other in Australia, their experience is going to suffer massively. Ideally you want a global network of servers and try to route users to their closest server.

3) TCP vs UDP. Packet loss is a very real problem, and you don't want clients to be stuck waiting for old packets to be resent to them when they already have the latest data. UDP makes a major difference in gameplay when dealing with lossy networks.

[0] https://github.com/hathora/hathora

5 comments

> 1) Bandwidth. The users internet can only handle so much network throughput, so for fast paced games (where you're sending data to each client at a rate of 20+ frames per second) it becomes important to optimize your per-frame packet size. This means using techniques like binary encoding and delta compression (only send diffs).

Games like Blizzard's Warcraft III / StarCraft II and Age Of Empire linked here in this thread (1500 archers on a 28.8 k baud modem) and oh so many other games approach that entirely differently: the amount of user inputs users can input is tinier than tiny. So instead of sending diff of the game state they send user inputs and the time at which they happened. Because their engine are entirely deterministic, they can recreate the exact same game state for everybody from only the timed user inputs.

Fully deterministic games engine also allow for lots of easy to reproduce bugs and they also allow for tiny save files.

Negligible network traffic. Tiny save files. Bugs are easy to reproduce. When the game allows it, it's the only reasonable thing to do.

This presents a (relative) vulnerability to cheating. If every computer has the full game state but players aren’t supposed to be able to know some things there is the potential for hacks.

The most obvious version of this in StarCraft is maphacks that let you see through fog of war, although that’s far from the only thing.

Poker meets all the technical requirements here, but sending everyone the contents of all hands would be a disaster.

> Poker meets all the technical requirements here, but sending everyone the contents of all hands would be a disaster

I work in the gambling space. A few notes, gambling games don’t ever rely on physics (even roulette, or a coin dozer type of game, everything is decided by a certified rng, no regulatory body that I am aware of allows outcomes based on physics engines). This means there is far less data to keep state on (a hand of cards is very tiny json blob to send). Games like poker etc. don’t require “real time”, if a player takes 4 seconds to decide if they want to call/raise/fold etc. then an extra 200ms of latency isn’t even going to be noticeable. So we don’t really care if there is a bit of latency, these aren’t FPS games.

Yep - even apparently physics-based digital casino games (think pachinko-style) are not allowed to use the real physics, that's really just faked as an animation to match the strictly controlled odds that can be easily verified by code inspection.
That should be considered cheating!

Games that pretend to be physics-based but in reality have a backed probability engine.

People who gamble understand this, it's literally the law.
This comes up in Minecraft too, and there was a small arms race around it. For the unfamiliar - certain resources in the game are useful and valuable but also rare (diamonds) and requires the player to spend a decent amount of time digging through the ground to find them.

But, since you have the whole game state, you can sift through the data and pinpoint these resources and acquire them quickly and with almost no effort. In multiplayer this is generally considered cheating and is called an "xray" modification to the game client. There are other variations of this hack that involve changing the game's textures to transparent images except for the specific resources you want to find.

Mulitplayer server administrators don't like cheats so they created countermeasures for this. The best example is probably Orebfuscator which "hides" said valuable resources until the player is very close to them.

https://dev.bukkit.org/projects/orebfuscator

Can't you still gain an unfair advantage using Bayesian search theory where probability drops to zero at the "revealing radius"?

Or is the "revealing radius" somewhat randomized over time in a way that's invisible to the client?

I mean, if you can acquire or otherwise reverse-engineer[0] the game seed, you can also just find resources by loading a local copy and noting the coordinates of ore. For major servers, anti-xray plugins will be installed as due diligence, but most of the anti-cheat efforts are focused on detection, reverting, and banning.

Ultimately, if you have a big enough server to attract serious cheaters, you will (or at least should) have tools that can also detect suspicious behavior based on heuristics (i.e. see if a player mined straight to an ore block). Tools like CoreProtect[1] can help detect and revert this.

Ore obfuscation still works very well, however, for the majority of causal cheaters that just googled "hacked minecraft client" and installed the first result.

One ore obfuscation technique used in PaperMC actually sends intentionally fake data to the user to "muddy the waters"[2].

(I know a lot of this because I help develop a Minecraft server management tool)

[0]:https://www.youtube.com/watch?v=GaRurhiK-Lk

[1]:https://github.com/PlayPro/CoreProtect/

[2]:https://docs.papermc.io/paper/anti-xray

https://en.wikipedia.org/wiki/Mental_poker might provide a means by which you could have a match be verifiable by all parties after the fact, but not leak info during it.
Deterministic games suffer from desync and issues with input limitations. It is true that war3 does this, but it has some serious drawbacks.

It also makes the client easier to cheat on and gives one player (the host) a preferential ping.

Most competitive FPS’s use server authoritative instead of replayable deterministic because of this.

If you want to see the limitations, head into the old war3 map editor forums and look up the hacks using automated clicks between placeholder variable units just to move a few bytes of data between clients so they can persist character stats between games.

I never really got into SC2, but Warcraft 3 and AOE2 have pretty major problems with online gameplay due to using deterministic lockstep. Back in the day, it wasn't uncommon for people to lag out in Warcraft 3, which would freeze the game for the whole lobby until either enough time passed that you could kick them, or they stopped whatever was making them lag in the background.

My friends and I actually quit playing AOE2DE because about 1/3-1/2 of team games had someone lagging from the start, which makes the game slow and choppy for every other player (and this was despite the game having a benchmark you had to complete at a certain framerate to play matchmaking online). Spending the next hour in an unplayably laggy mess of a game just isn't fun.

I know Supreme Commander (supcom) also has problems with 1 player lagging causing everyone else to lag.

There's also the matter of it being much harder to actually program a deterministic game, especially once you try and multithread it (which is really necessary for modern RTS, but a nightmare for determinism). Fixing all desyncs is very difficult (AOE2 and WC3 both still have desync bugs. In AOE2DE some people use them to cheat their way up the ranked ladder, desyncing whenever they're losing so it counts as a draw instead of a loss). I've heard the upcoming Sanctuary RTS (indie supcom spiritual successor in development) talked to a bunch of RTS industry vets, and were told that it's not worth it to do a deterministic simulation these days unless you want over ~10k units. AI War 2 [1] has a pretty interesting network model for multiplayer, where they've got a semi-deterministic simulation, and self heal client's simulations if they diverge from the host, which allows for it to be heavily multithreaded and have 10k-100k+ units. You'd probably have to use dedicated servers for a competitive ranked mode if you went that way (and they'd be heavier to host, for a smaller per-game player count than eg. an fps or a survival game).

[1] https://wiki.arcengames.com/index.php?title=Category:AI_War_...

1) Bandwidth is pretty irrelevant now. Even players on cellular networks have megabits of bandwidth. I stopped spending a large amount of time optimizing for packet size while building the networking for Dota 2. Nobody is playing a 14.4k modem anymore.

2) Server placement is still an issue. It's still ~200ms round trip from New York to Sydney for example. Fortunately, cloud infrastructures can make getting servers to closer your players much easier now. You don't have to physically install servers into data centers in the region.

3) Packet loss still occurs, but is incredibly rare that the gap between using TCP and UDP is narrowing. Modern TCP implementations like Microsoft's are amazing at handling loss and retransmission. However, I'd probably use QUIC for game networking if I was to write an engine from scratch these days.

Having worked on a fairly popular .io game mostly played by kids on phones and chromebooks over wifi, I concur with everything you said.

1) We updated around 60Hz, and bandwidth was never an issue (everything was binary encoded and many values were hand compressed to the number of bits they needed, but we didn't run any additional compression or ever feel the need to optimize further, and these were games with 100 players in an instance).

2) Probably the biggest key to success was global server placement. That mattered most, and we ended up renting servers in 10-20 regions around the globe to keep latency down for players. I didn't work on this part, but I know it was quite a bit of work, and also very experimental. Physical proximity of servers didn't always translate to lower latencies. Country borders could by surprisingly laggy, in certain cases.

3) This is the one that really shocked me. As I said, players were engaging with the game in almost the worst conditions imaginable, weak chromebooks over wifi and all communication was over websockets (which basically behave like TCP). Still, packet loss was not an issue. We prioritized low-latency over smoothness, so our sever just blasted out the latest state to the client ~60 times a second, and the client would display mostly like a dumb terminal. This is approximately how you're supposed to do it with UDP, but dropped packets and resends are supposed to make it unworkable over TCP. But we just YOLO'd it with TCP and it worked great! I'm sure at some point in the past before internet infrastructure got so good, it would have been a disaster, but seems like, for most players, we've advanced past that.

Now, I know partially this just weeded out the people with bad connections, but I really don't think it was that many. Certainly my own experiments taking my laptop and checking out various wifi spots around town with wireshark indicated that modern infrastructure is just that good.

(Actually, in terms of improving latency, beyond making sure people were playing on local severs, the next biggest thing was just optimizing javascipt. Both the client and server were written in it, and GC stalls were a huge problem. The eventually solution was to rewrite the server to not generate that much garbage in the first place and then just disable the GC. Reboot the server process after ~10 minutes between games. Another big js issue was code getting de/reoptimized continually. The biggest issue was inconsistent numerical literals between float and integer. Once we figured out it was an issue, we became very disciplined about that.)

I understand if you wish to keep it private, but what was the name of the .io game you worked on?
Another key requirement that must be considered is packet ordering. With games you care about the latest state and thus discarding older out of order packets is a better strategy than waiting for packets to serialize them like TCP would do.
You only care about the latest state for some events. Only events which will soon be superseded by a later event should go over UDP. Move A to X, sent on every frame, fine. Create monster at Y, no.

If you find yourself implementing reliability and retransmission over UDP, you're doing it wrong. However, as I mention occasionally, turn off delayed ACKs in TCP to avoid stalls on short message traffic.

Reliable, no head of line blocking, in order delivery - pick any two. Can't have all three.

Why not use Valve's game networking stuff? Just curious.
There's another way to look at this. The more data you send per packet, the more that can be reasonably interpolated by the client in between updates. Diffs also becomes impossible of course in cases where you're using UDP. So for instance imagine you're only sending visible targets to a player in updates, and then there is a brief stutter - you end up risking having a target magically warp onto the player's screen, which is obv undesirable. Pack in everybody's location (or at least maybe within some as the crow flies radius) and the client experience will break less frequently. Of course like you said though, now the bandwidth goes up.
I’ve written a diffing algorithm using UDP. You tell it to diff against a previous packet with an id. Every so often you send a full key frame packet so they always stay in sync and have full game state.

It works really well and cut my network traffic down by a whole couple orders of magnitude.

The trick is to figure out update grouping so you can create clean groups of things to send and diff on. Ultimately delta compression doesn’t even care what the data is, so modern net stacks do some really efficient compression in this way.

I’ve written a diffing algorithm using UDP. You tell it to diff against a previous packet with an id. Every so often you send a full key frame packet so they always stay in sync and have full game state.

Right. That's how video streams work, too. Every once in a while there's a complete frame, but most frames are diffs.

The key here though is that your server keeps the last N ticks of state (probably around 20) and calculates the diff for each player based on the last id they reported seeing. This way missing an update doesn't get you completely out of sync until the next full state sync, it just gets you a larger diff.
There is a useful intermediate approach, send more entities but use an importance algorithm to control how frequently each has their data sent. Clients keep knowledge of more entities this way, but bandwidth usage/frame can be kept stable.
Sounds like the recent changes star citizen made via the entity component update scheduler
Diffs aren't impossible over UDP. The client should be sending to the server which states it has locally, along with inputs. Then since the server has a (potentially non-exhaustive) list of recent states it knows the client has seen, it can choose to make a diff against the most recent of those and tell the client which state the diff is against. Then the client and server just need to keep a small buffer of pre-diff-encoded packets sent/received.
Another trade off with your approach of sending non-visible entities ahead of time is that it makes wall hacks possible.

Anyone aware of any conceptual way to “encrypt” the location data so it’s only usable if the player has line of sight? I doubt that’s easy/possible but don’t even know where to begin searching for research around topics like that.

Here are some article that address wall hacking - not quite what you're looking for but, still a great read.

https://technology.riotgames.com/news/demolishing-wallhacks-...

https://technology.riotgames.com/news/peeking-valorants-netc...

Not quite what I originally had in mind but interesting idea of storing the remote entity locations in the trusted enclave: https://lifeasageek.github.io/papers/seonghyun-blackmirror.p...
How does UDP work if you're also using delta compression? I would naively expect that the accumulation of lost diff packets over time would cause game state drift among the clients.
The simplest way I've done it: say client and server start on tick 1, and that's also the last acknowledgement from the client that the server knows about. So it sends a diff from 1 to 2, 1 to 3, 1 to 4, until server gets an ack for tick 3, for example. Then server sends diffs from 3 to 5, 3 to 6, etc. The idea is that the diffs are idempotent and will take the client to the latest state, as long as we can trust the last ack value. So if it's a diff from 3 to 6, the client could apply that diff in tick 3, 4, 5 or 6, and the final result would be the same.

This is done for state that should be reliably transmitted and consistent. For stuff that doesn't matter as much if they get lost (explosion effects, or what not), then they're usually included in that packet but not retransmitted or accounted for once it goes out.

This is different (and a lot more efficient) than sending the last N updates in each packet.

> The idea is that the diffs are idempotent and will take the client to the latest state, as long as we can trust the last ack value. So if it's a diff from 3 to 6, the client could apply that diff in tick 3, 4, 5 or 6, and the final result would be the same.

Can you elaborate or give an example of how this works?

Imagine the following changes each tick:

    1: x = 1
    2: x = 2
    3: x = 3, y = 5
    4: x = 4
    5: x = 5
    6: x = 6
    7: x = 7, y = 1
Diff from 2 to 4 would be "x = 4, y = 5".

Diff from 3 to 6 is "x = 6", which will always be correct to apply as long as client is already on ticks 3~6. But if you apply at tick 2, you lose that "y = 5" part. This can't happen in a bug-free code because the server will only send diffs from the latest ticks it knows for sure the client has (because the client sends acks)

Cool thanks, that makes sense! In my head I was thinking the diff from 2 to 4 would be something like "x += 2, y += 5", and 3 to 6 would be "x += 3, y += 0"... which of course wouldn't be idempotent and wouldn't allow you to apply the update to different client states.
You can extend it to practical use by imagining these terms as:

    entity[123].active = true
    entity[123].x = 4
    entity[123].y = 8
Then later...

    entity[123].active = false
And with special rules such that if `active = false`, no other properties of the entity needs to be encoded. And if `active = true` is decoded, it sets all properties to their default value. Then you get a fairly simple way to transmit an entity system. Of course you'd want to encode these properties in a much smarter way for efficiency. But the basic idea is there
That is a fascinating use of idempotence, bravo!
If you get your data small enough to fit multiple updates into a single packet, you can send the last N updates in each packet.

If your updates are bigger; you probably will end up with seqs, acks and retransmitting of some sort, but you may be able to do better than sending a duplicate of the missed packet.

Exactly, you assign a sequence number to each update, have the client send acks to convey which packets it has received, and the server holds onto and sends each unacked update in the packet to clients (this is an improvement over blindly sending N updates each time, you don't want to send updates that you know the client has already received).

If the client misses too many frames the server can send it a snapshot (that way the server can hold a bounded number of old updates in memory).

You just described TCP
It's close but TCP will retransmit frames rather than packing multiple updates in a single frame.

It's common for people to build this kind of retransmission logic on top of UDP (especially for networked games), it's sometimes referred to as "reliable UDP".

It’s not TCP, it’s TCP without head-of-line blocking, which makes it much more suitable for real time games.
TCP forces sequencing across all packets, SCTP is a bit closer.
You don’t delta compress everything, only a significant part of the payload. Each diff is referential to a previous packet with a unique id. If you don’t have the previous packet, you just ignore the update.

Every 30 frames or so, you send a key frame packet that is uncompressed so that all clients have a consistent perspective of world state if they fell behind.

Using sentTime lets clients ignore old data and interpolate to catch up if behind as well.

It does work, I wrote one from scratch to create a multiplayer space rpg and the bandwidth savings were incredible.

How does UDP / packet loss work with delta compression? If you’re only sending diffs and some of them may be lost or received out of order, doesn’t that break delta compression?
The same way it works for video codecs which also send diffs over UDP. There are mechanisms to introduce redundancy in the stream, ask for retransmission, handle missing information.