| So sometimes I don't test these projects that much but I did this time. Here are a few thoughts: My biggest goal was "make sure that my bottleneck is serialization or syscalls for sending to the client." Those are both things I can parallelize really well, so I could (probably) scale my way out of them vertically in a pinch. So I tried to pick an architecture that would make that true; I evaluated a ton of different options but eventually did some napkin math and decided that a 64-million uint64 array with a single mutex was probably ok[1]. To validate that I made a script that spins up ~600 bots, has 100 of them slam 1,000,000 moves through the server as fast as possible, and has the other 500 request lots of reads. This is NOT a perfect simulation of load, but it let me take profiles of my server under a reasonable amount of load and gave me a decent sense of my bottlenecks, whether changes were good for speed, etc. I had a plan to move from a single RWMutex to a row-locking approach with 8,000 of them. I didn't want to do this because it's more complicated and I might mess it up. So instead I just measure the number of nanos that I hold my mutex for and send that to a loki instance. This was helpful during testing (at one point my read lock time went up 10x!) but more importantly gave me a plan for what to do if prod was slow - I can look at that metric and only tweak the mutex if it's actually a problem. I also took some free wins like using protobufs instead of JSON for websockets. I was worried about connection overhead so I moved to GET polling behind Cloudflare's cache for global resources instead of pushing them over websockets. And then I got comfortable with the fact that I might miss something! There are plenty more measurements I could have taken (if there was money on the line I would have measured some things like "number of TCP connections sending 0 moves this server can support" but I was lazy) but...some of the joy of projects like this is the firefighting :). So I was just ready for that. Oh and finally I consulted with some very talented systems/performance engineer friends and ran some numbers by them as a sanity check. It looks like this was way more work than I needed to do! I think I could comfortable 25x the current load and my server would be ok. But I learned a lot and this should all make the next project faster to make :) [1] I originally did my math wrong and modeled the 100x100 snapshots I send to clients as 10,000 reads from main memory instead of 100 copies of 100 uint64s, which lead me down a very different path... I'm not used to thinking about this stuff! |
Funny, when I went there were just over 600 active players and things were running super smoothly, even on my mobile. Kudos!
Do you see this project and the things you’ve tried applying to other future projects?