Thanks! Replying a bit here since that post is from 2015. One can write anecdotal examples that are biased both ways. But let's talk about a single issue, the IX/IY registers.
The IX/IY registers are heavyweight but one needs to remember the "best-practices" of that era and architecture. In well optimized Z80 code, the IX/IY registers are often used for critical "global variables" that you can keep all the time in registers across many subroutine calls (think "segment registers": base pointers for important tables or buffers that are not fixed addresses through the whole program). Their performance beats needing frequent indirect load/store from pointers stored in memory via other registers that often need to be preserved/restored.
You can also use some relatively low-latency instructions that involve IX/IY, in particular PUSH/POP were often used in optimized buffer-copy routines: you burn all registers to fetch up to 20 bytes of contiguous data with POPs, then you patch the SP register and issue PUSHs in inverse order to store those 20 bytes in another location; loop if needed for >20 bytes, even with loop overhead this is faster than LDIR/LDDR. Games used that trick all the time for block copies like sprite bitblt or double-buffer animation.
I'll be more convinced by realistic benchmarks, and yes the Fib2 that I quoted before is not impressive even for the standards of CPU microbenchmarks, but maybe someone would know some real-world code that had good ports to both CPUs and could give a better verdict. Unfortunately games are never good choices, the 8-bit systems had radically different architectures for essential features like video and audio so "ports" were often full rewrites even at a high level like rendering strategies... there might be exceptions, like the AI component of a chess game.
The IX/IY registers are heavyweight but one needs to remember the "best-practices" of that era and architecture. In well optimized Z80 code, the IX/IY registers are often used for critical "global variables" that you can keep all the time in registers across many subroutine calls (think "segment registers": base pointers for important tables or buffers that are not fixed addresses through the whole program). Their performance beats needing frequent indirect load/store from pointers stored in memory via other registers that often need to be preserved/restored.
You can also use some relatively low-latency instructions that involve IX/IY, in particular PUSH/POP were often used in optimized buffer-copy routines: you burn all registers to fetch up to 20 bytes of contiguous data with POPs, then you patch the SP register and issue PUSHs in inverse order to store those 20 bytes in another location; loop if needed for >20 bytes, even with loop overhead this is faster than LDIR/LDDR. Games used that trick all the time for block copies like sprite bitblt or double-buffer animation.
I'll be more convinced by realistic benchmarks, and yes the Fib2 that I quoted before is not impressive even for the standards of CPU microbenchmarks, but maybe someone would know some real-world code that had good ports to both CPUs and could give a better verdict. Unfortunately games are never good choices, the 8-bit systems had radically different architectures for essential features like video and audio so "ports" were often full rewrites even at a high level like rendering strategies... there might be exceptions, like the AI component of a chess game.