Hacker News new | ask | show | jobs
by skittlebrau 4869 days ago
I worked at Ensemble Studios (Age of Empires) and by the time we shipped Age of Mythology we had much better tools for dealing with sync bugs. Most of the run of the mill ones could be tracked down pretty quickly, but there were still plenty of painful ones. What made them easier to track down was a more advanced tracking/logging system for the simulation state history. The simulation was littered with tons of sync logging code that tracked the execution flow and values of things as they were calculated/updated. When the state went out of sync, all the machines would dump their last couple updates worth of logging (often several gigs) and you could diff them to see where things diverged. If you were making a synced simulation game, a nicely done version of this type of thing would be pretty useful to have, especially if you made a good diff tool to go with it.

It's not completely obvious what technology you could make that would make it easier to avoid sync bugs in the first place. You could make a good network command-passing and simulation timing library, but in my experience the majority of problems did not come from bugs with the networking itself. Most of the sync bugs were from things like uninitialized variables, memory overwrites, using user input or other local machine state directly in the simulation without going through a multiplayer command, using a non-synced random number generator in the sim, DirectX changing the FPU rounding mode on you, etc. (Using a "safer" language would help with stuff like uninitialized variables/memory overwrites of course, but at an inevitable performance cost. Static code analysis tools are pretty good at finding these type of problems now too.)