Hacker News new | ask | show | jobs
by wk_end 1024 days ago
Yes, exactly. Were I hand-writing the assembly for the 6502, I'd make all sorts of decisions that the C code doesn't - and that a compiler can't - to make it more efficient.

Instead of passing in a pointer to two separate functions, I'd write a single UpdateBalls procedure that operated on global data. This data is going to be core to my game logic and physics, so I'd put it all on the ZP. As you suggested, "structure-of-arrays". I'd choose a fixed number of balls so I don't need an argument; maybe I'd set my loop to iterate backwards so I get a free zero check with the decrement, maybe I'd unroll the loop ("dead" balls can be placed off-screen with a dx/dy of 0). I'd probably decide that I don't need 16-bit precision for the deltas (how fast could the balls move, really?), and a 16-8 addition is going to be quicker than a 16-16 one.

The compiler isn't going to make these optimizations; that's not a slight against the compiler. In fact, I just checked - the output [0] when I write my C code this way is pretty close to what I'd hand-write. It's roughly a third the number of instructions and - I'm not going to cycle count, so this is a stab in the dark - would take maybe an order of magnitude fewer cycles to run. semu wasn't written with performance on the 6502 in mind, it's not going to have taken considerations like this, so it's going to inevitably be slow when compiled.

[0] https://godbolt.org/z/WYKKeh9b7

1 comments

I'd actually like the llvm-mos to do an automated AoS to SoA analysis and rewrite, but haven't gotten around to it yet. There aren't any intrinsic theoretical obstacles I'm aware of though; it's just difficult code to write.

Now that this has come up again as the stock reason "you can't do C well on the 6502", replacing the stack, the zero page, and the register set, I'm probably going to reprioritize it and put the register allocator on pause.