Hacker News new | ask | show | jobs
by tom_ 1027 days ago
I bet that sizeof(int)==2 - which immediately tells you everything you need to know - and the return value from a function has 8 bits in X and 8 bits in A. So ldx#0:txa is how you load a return value of (int)0.

Regarding this specific unrolled loop, I would expect a 6502 programmer would just write the obvious loop, because they're clearly optimizing for space rather than speed when calling the ROM routine. They'll be content with the string printing taking about as long as it takes, which clearly isn't too long, as they wouldn't have done it that way otherwise. And the loop "overhead" won't be meaningful. (Looks like it'll be something like 7 cycles per character? I'm not familiar with the Apple II. Looks like $fbfd preserves X though.)

2 comments

We did a lot of loop unrolling and self modifying code back in the day, when making demos for the C64. The branch is really expensive. For example, clearing the screen you might use 16 STA adr,x and then add 16 to X before you branch to the loop.
Indeed, in some cases you want the unrolls. The 6502 is good in the twisties, but if you're trying to do any kind of copy or fill then the percentage of meaningful cycles is disappointingly low, and the unroll may be necessary. Also, if you're trying to keep in sync with some other piece of hardware, then just doing it step at a time can be much easier.

I have done a lot of all of this sort of code and I am quite familiar with the 6502 tradeoffs. But for printing 15 chars by calling a ROM routine, I stand by my comments.

Yes, I compiled with -O3 for maximum speed. That would be an unusual flag choice in most cases.

I just wanted to use 6502 code (so many seem to be able to read it!) with C side by side. x86 would have worked as well. Where the fastest answer would also be the same construct, assuming the dependency on an external routine.