Hacker News new | ask | show | jobs
by TillE 1368 days ago
> It allowed me to make many Nintendo Switch specific optimizations, and even some optimizations for the PC version

Are there significant CPU-specific optimizations that can be made for the Switch / ARMv8 that wouldn't apply to x86-64? I've never really dug into things at that level, I wouldn't know where to begin except for like vector instructions.

5 comments

My understanding is ARM has less strict memory (ordering) guarantees, as well as the inability to explicitly trigger cache line flushes from user-space. That being said, I imagine most of the optimizations would come from the particular aspects of the GPU/graphics pipeline, which I imagine is substantially different from the standard PC structure.
Yeah you can always cut corners with the GPU if you have to, but Factorio has the large problem of making a CPU-intensive deterministic simulation run at 60 Hz.

Thinking about it some more, there are also probably some tweaks you could use for situations with limited resources which wouldn't apply to most PCs, but would affect the Switch.

(Disclaimer: I haven't written assembly for a while so this may not be true any more)

One cool thing about ARM is that vector instructions run in parallel with the regular CPU pipeline. You can do some neat optimizations where you interleave sequential part of an algorithm runs while the SIMD instructions are executing. However if you do this yourself the code is going to be super non-portable. In general, knowing that something is always going to be running on a Switch lets you dig deeper into architecture specific optimizations, since x86 has so much variance it would take a ton of effort to really dial in performance on the whole menagerie of Intel and AMD CPUs

CPU wise - although you can optimize specially for the ARM - and it's great to make sure your math is using NEON and that you don't have some pathological slow down because your code assumes more cores you'll still find the lions share of your optimization on CPU benefiting both architectures.

For fun, I'll mention that there is a hardware hash function that you can use. The public A57 documentation has a performance guide available that is applicable to the Switch.

https://developer.arm.com/documentation/uan0015/b/

Most likely the optimizations are higher level than that (for the most part). With the switch they will be able to target to specific core counts and use a less generic thread/memory synchronization api.

There may be some cases where the fixed ARMv8 instruction set allows them to use instructions that have equivalents on recent x64 processors but not back to their minspec PC.

I have no idea. But if there are, the factorio devs will find them and write an extremely detailed blog post about the entire process.