|
|
|
|
|
by tom_
2930 days ago
|
|
Regarding the opcode dispatch: setting up the RTS in this way is quite expensive, and (if you've got the room) you could be better off assembling a little thunk somewher in memory. 4C 00 >SET (JMP >SET*256). You'd do this on startup. A JMP to this thunk costs 3 cycles, and the JMP in the thunk costs 3 cycles, so that buys you nothing compared to the RTS. And the STx to set up the low byte takes up 3 cycles (zero page) or 4 cycles (elsewhere), which is the same or worse than the PHA. But because the high byte is always set up, you save the 5 cycles spent setting that up. (If you're running from RAM, you don't even need the thunk.) (Also: the opcode dispatch's EOR trick is space-efficient, but takes an extra cycle - and one fewer bytes, I won't deny - compared to doing a TAY after fetching the byte, then a TYA:AND $F0 later. That sequence takes 6 cycles, whereas the LSR:EOR (R15L),Y sequence takes 7 or 8.) |
|
Another thing would be self-modifying code (if we can forgo ROM-ming this, which Woz couldn't): the interpreter mutates the operand of an immediate JMP instruction to set up the address. That instruction then simply follows; there is no need to branch to it. Same as your thunk, but placed inline.
Ah, the first machine language program I wrote was on the 6502 and used self-modifying code to march through the graphics buffer. Indexed addressing modes were the next chapter in the Rodney Zaks book.