| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shawn_w 61 days ago
	Quite a few architectures have a dedicated 0 register.

4 comments

repelsteeltje 61 days ago

Yep. The XOR trick - relying on special use of opcode rather than special register - is probably related to limited number of (general purpose) registers in typical '70 era CPU design (8080, 6502, Z80, 8086).

link

classichasclass 61 days ago

Unfortunately, 6502 can't XOR the accumulator with itself. I don't recall if the Z80 can, and loading an immediate 0 would be most efficient on those anyway.

link

blywi 61 days ago

XOR A absolutely works on Z80 and it's of course faster and shorter than loading a zero value with LD A,0. LD A,0 is encoded to 2 bytes while XOR A is encoded as a single opcode. XOR A has the additional benefit to also clear all the flags to 0. Sub A will clear the accumulator, but it will always set the N flag on Z80.

link

eichin 61 days ago

Yeah, the article seems to have missed the likely biggest reason that this is the popular x86 idiom - that it was already the popular 8080/Z80 idiom from the CP/M era, and there's a direct line (and a bunch of early 8086 DOS applications were mechanically translated assembly code, so while they are "different" architectures they're still solidly related.)

link

classichasclass 61 days ago

Ah, thanks, I couldn't recall off the top of my head.

link

dmitrygr 61 days ago

should set Z too

link

repelsteeltje 61 days ago

You're absolutely right, I stand corrected.

The 6502 gets by doing immediate load: 2 clock cycles, 2 bytes (frequently followed by single byte register transfer instruction). Out of curiosity I did a quick scan of the MOS 1.20 rom of the BBC micro:

  LDY #0 (a0 00): 38 hits
  LDX #0 (a2 00): 28 hits
  LDA #0 (a9 00): 48 hits

link

tom_ 61 days ago

Are you sure you're not an LLM? There is no way anybody writing 6502 would do anything else, because there's no other way to do it.

(You can squeeze in a cheeky Txx instruction afterwards to get a 2-or-more-for-1, if that would be what you need - but this only saves bytes. Every instruction on the 6502 takes 2+ cycles! You could have done repeated immediate loads. The cycle count would be the same and the code would be more general.)

link

repelsteeltje 60 days ago

> Are you sure you're not an LLM?

Hard to tell, but I don't think so ;-)

I suppose using Txx instructions rather than LDx is more of an idiom than intended to conserve space. Also, could an LDx #0 potentially be 3 cycles in the edge case where the PC crosses a page boundary? (I'm probably confused? Red herring?)

link

tom_ 60 days ago

I don't know how the 6502's PC increment actually worked, but it was an exception to the general rule of page crossings (or the possibility thereof) incurring a penalty, or, as was also sometimes the case, just ignored entirely. (One big advantage of the latter approach: doing nothing does take 0 cycles.)

The full 16 bits would be incremented after each instruction byte fetched, and it didn't cost any extra if there was a carry out of the MSB.

link

bonzini 61 days ago

The Z80 can do either LD A,0 or SUB A or XOR A, but the LD is slower due to the extra memory cycle to load the second byte of the instruction.

link

wongarsu 61 days ago

And [as mentioned in the article] even modern x86 implementations have a zero register. So you have this weird special opcode that (when called with identical source and destination) only triggers register renaming

link

bonzini 61 days ago

A move on SPARC is technically an OR of the source with the zero register. "move %l0, %l1" is assembled as "or %g0, %l0, %l1". So if you want to zero a register you OR %g0 with itself.

link

lynguist 61 days ago

Indeed!!

MIPS - $zero

RISC-V - x0

SPARC - %g0

ARM64 - XZR

link

classichasclass 61 days ago

PowerPC: "r0 occasionally" (with certain instructions like addi, though this might be better considered an edge case of encoding)

link

Findecanor 61 days ago

On 64-bit ARM, the same register number is XZR in some instructions and the stack pointer in others.

Alpha: r31, f31

Very few architectures have a NAT bit though.

link

signa11 61 days ago

indeed. riscv for instance. also, afaik, xor’ing is faster. i would assume that someone like mr. raymond would know…

link

IshKebab 61 days ago

> afaik, xor’ing is faster

Even tiny tiny CPUs can do sub in one cycle, so I doubt that. On super-scalar CPUs xor and sub are normally issued to the same execution units so it wouldn't make a difference there either.

link

tliltocatl 61 days ago

On superscalars running xor trick as is would be significantly slower because it implies a data dependency where there isn't one. But all OOO x86's optimize it away internally.

link

IshKebab 61 days ago

Sub has the same false data dependency.

link

pif 61 days ago

Which part of "mathematical operations don’t reset the NaT bit" did you not understand?

link