On the other hand, if you consider the losses as a percentage based on people who gave up on the margin, you can easily derive losses relative to what they would have gotten without the bug in the tens of millions of dollars, and hundreds of millions is on the table.
I can't "prove" that, because the data isn't there, but it is a completely reasonable guess. At an 8 minute load time it is reasonable to guess there were a lot of people who stopped playing and thus stopped spending because they thought their install was crashed or corrupted, and just silently wandered away. I certainly wouldn't. I've got some games that load more slowly than others but I don't think I've waited multiple minutes since the Commodore 64 era.
Chiming in here as someone who used to play in competitive GTA leagues. I quit playing because of too much time spent in load screens. It was worse than just this slow loading bug: random disconnects that triggered loading again got more and more common over time, too.
The problem was that the JSON parsing code was doing exceptionally stupid things in the first place (like calling scanf), which doesn't matter on small JSON files used in testing but exploded in production once that JSON file was growing to several MBytes. Just dropping in a different JSON parser library was probably enough to fix the problem on R*'s side.
Why is it "exceptionally stupid"? sscanf is basically a slighlty more primitive regex engine than e.g. PCRE and I suspect it would work about as fast (if it weren't for that silly strlen() call) — and there are lexers that are basically just a loop with a match() call in it with
as the pattern or something like that over the input string, and that is not generally considered to be a stupid way to write a lexer. Why would sscanf be?
Most of the standard C library implementations, including FreeBSD's libc [0] and glibc [1] have sscanf implemented like that, by calling fscanf on a dummy FILE object (with its size populated by strlen() at every call, no caching).
Of course, there are implementations whose authors thought about that and decided to do the reasonable thing instead, e.g. musl [2] and Plauger's old stdlib [3].
No, the bug is not in Windows. scanf can not cache string length as it can't guarantee x, and x + offset are the same thing, nor can it guarantee the string was unmodified since the last call.
Windows provides _snscanf_s if you want to keep track of the string length yourself instead of having it recompute it each time.
The fix would have nothing to do with caching the string length across multiple calls to sscanf. The fix would be to have sscanf not call strlen on the input string in the first place, and instead only process the input string up to the point where it satisfies the format string or the input string terminates. After all, regular scanf works fine without the length of stdin. As TFA also says:
>To be fair I had no idea most sscanf implementations called strlen so I can’t blame the developer who wrote this. I would assume it just scanned byte by byte and could stop on a NULL.
The author's replacement strlen does the "cache the length across calls" thing only because bolting that on top of the default strlen was easier than doing the lazy parsing thing, since the latter would've required making an actual sscanf implementation to do that from scratch.
scanf a great way to cause a memory over/underun. Dont think so? Go read the docs for the different CRT's out there. None of them really match on what all the % items mean.