| Nitpick: this isn’t standard C (it uses void main, not int main) Nitpick 2: why ldx #0 txa rts? I would think lda #0 rts is shorter and faster Back to my question: if it can’t, the claim “an assembler programmer couldn't do better” isn’t correct. I think an assembler programmer for the 6502 would consider doing a jmp at the end, even if it makes the function return an incorrect, possibly even unpredictable value. If that value isn’t used, why spend time setting it? A assembly programmer also would: - check whether the routine at 0xFBFD accidentally guarantees to set A to zero, or returns with the X or Y register set to zero, and shamelessly exploit that. - check whether the code at 0xFBFD preserves the value of the accumulator (unlikely, I would guess, but if it does, the two consecutive ‘l’s need only one LDA#) - consider replacing the code to output the space inside “hello world” by a call to FBF4 (move cursor right). That has the same effect if there already is a space there when the code is called. - call 0xFBF0 to output a printable character, not 0xFBFD (reading https://6502disassembly.com/a2-rom/APPLE2.ROM.html, I notice that is faster for letters and punctuation) On a 6502, that’s how you get your code fit into memory and make it faster. To write good code for a 6502, you can’t be concerned about calling conventions or having individually testable functions. |
Regarding this specific unrolled loop, I would expect a 6502 programmer would just write the obvious loop, because they're clearly optimizing for space rather than speed when calling the ROM routine. They'll be content with the string printing taking about as long as it takes, which clearly isn't too long, as they wouldn't have done it that way otherwise. And the loop "overhead" won't be meaningful. (Looks like it'll be something like 7 cycles per character? I'm not familiar with the Apple II. Looks like $fbfd preserves X though.)