| What I agree on is that if we had modern .NET available we'd get a free 2-3x improvement, it would definitely be great. BUT having said that, if you're into performance but unwilling to use the tools available then that's on you. From the article it seems that you're using some form of threading to create things, but you don't really specify which and/or how. The default C# implementations are usually quite poor performance wise, so if you used for example the default thread pool I can definitively say that I've achieved a 3x speedup over that by using my own thread pool implementation which would yield about the same 30s -> 12s reduction. Burst threading/scheduling in general is also a lot better than the standard one, in general if I feed it a logic heavy method (so no vectorization) then I can beat it by a bit, but not close to the 3x of the normal thread pool. But then if your generation is number heavy (vs logic) then having used Burst you could probably drop that calculation time down to 2-3 seconds (in the same as if you used Vector<256> numerics). Finally you touch on GC, that's definitely a problem. The Mono variant has been upgraded by them over time, but C# remains C# which was never meant for gaming. Even if we had access to the modern one there would still be issues with it. As with all the other C# libraries etc., they never considered gaming a target where what we want is extremely fast access/latency with no hiccups. C# in the business world doesn't really care if it loses 16ms (or 160ms) here and there due to garbage, it's usually not a problem there. Coding in Unity means having to go over every instance of allocation outside of startup and eliminating them, you mention API's that still need to allocate which I've never run into myself. Again modern isn't going to simply make those go away. |
To give some context, things are very complex in our game, we have fully dynamic terrain with terrain physics (land-slides), advanced path-finding of hundreds of vehicles (each entity has its own width and height clearance), trains, conveyors and pipes carrying tens or even hundreds of thousands of individual products, machines, rockets, ships, automated logistics, etc. There is no one thing that could be bursted to get 3x gain. At this point, we'd have to rewrite the entire game in C++.
So what's the reason we use C#? Productivity, ease of debugging and testing, and resilience to bugs (e.g. null dereference won't kill the program). Messing with C++ or even burst would cost us more time and to be honest, the game would possibly not even exist at that point.
Could you share some details about your custom thread pool that got 3x speedup? What was the speedup from? It is highly unlikely that a custom thread pool would have any significant impact on the benchmark in our case. As you can see from Figure 3, threaded tasks run for about 25% of the total time and even with Mono, all tasks are reasonably well balanced between threads. Threads utilization is surely over 90% (there is always slight inefficiency towards the end as threads are finishing up, but that's 100's of ms). An "oracle" thread pool could speed tings up by 10% of 25%, so that is not it.
Vectorization could help too but majority of the code is not easily vectorizable. It's all kinds of workloads, loading data, deserialization, initialization of entities, map generation, precomputation of various things. I highly doubt that automatic vectorization from code generated by IL2CPP would bring more than 20% speedup here. The speedup from burst would mostly come from elimination of inefficient code generated by Mono's JIT, not from vectorization.
For now, we are accepting the Mono tax to be more productive. But I am hoping that Unity will deliver on the CoreCLR dream. In the meantime, my post was meant raise awareness and stir up some discussion, like this one, which is great. I've read lots of interesting thoughts in this comments section.