Hacker News new | ask | show | jobs
by thecompilr 1306 days ago
Parity is one of the cheapest thing you can compute in hardware
2 comments

Compared to what? The sign flag has zero cost since it's just the top bit of the result. The zero flag is one NOR gate to compute. You get the carry flag for free with addition. (Although processors usually throw a lot of other functionality into carry (such as shifting) so the carry flag often ends up complicated.) A signed overflow flag is confusing but one gate to implement. In comparison, parity is difficult. You'll note that other early microprocessors such as the 6800, 6502, and 68000 did not have a parity flag. Same with early computers such as the System/360 and PDP-11.

Parity is expensive to compute with 1970s hardware. If you look at the die of the TMX 1795, the parity flag computation is a substantial part of the die, about half the size of the register file or the ALU. The first problem is that XOR is an inconvenient gate to implement, especially if you use standard MOS logic. Processors usually have special pass-transistor tricks to make XOR more compact. The second problem is that parity needs to XOR all the bits together, so you don't get parallelism.

While I know you already know this, technically all of the flags have large prop. delay and nothing is for free. :D

Computing any of them is equal amounts of delay if you want to mux/select any of them to output, and then on top of that you have to sequentially NOR-chain (and/or some hierarchical manner) the result for zero, and waiting on the adder carry chain for the carry flag. If you were sequentially (and/or some hierarchical manner) XOR-chaining for parity in parallel, is it really that much more delay? Especially if XOR uses domino logic etc.

I see a parity in the same realm as the adder carry chain as far as prop. delay goes (for computing sign/carry at the end.) Of course carry chains can be made more efficiently than a direct sequential chain, but you could be hierarchical about xor-chains for parity too..

Yes, addition is annoyingly slow due to carry propagation. But a problem with the parity flag is that it is computed on the result of your arithmetic operation, so it add another big delay to every operation. The other problem is that it adds a lot of circuitry (by 1970s standards).
Yeah agreed -- though that's what I meant about the zero flag: NOR instead of XOR combinatorial complexity, but same prop. delay critical path, only simpler circuits per bit if purely NMOS or PMOS! :) I think the 1970s Intel were doing MOSFETs then?

Technically parity could become a stable value before the zero flag could! ;)

Oh but I did forget that a single multiple input NOR could be large but without exponential amounts of gates

Obviously compared to any other ALU operation, except the most basic binary ops. It is just a parity of 8 bits, so even in a straightforward implementation you only have three levels of XOR, it is not completely serial, you do get to parallelize some. I do agree that it is not free on an 8080, but it is still a lot cheaper than add or sub.
How is the zero flag one NOR gate? Don't you have to NOR all the bits?
One very large NOR gate. But unlike XOR, an n-bit NOR or NAND is just 2n transistors in CMOS (at least in theory).
One biiiig NOR gate.
Yes, one 8-bit wide NOR gate. This is easy to implement in NMOS, basically you have a wire the width of the ALU and attach a transistor driven by each bit to pull it low.
By itself, sure, but fidelity to x86 requires the PF calculation be made alongside a wide range of common instructions, which is a significant cost either for the microarchitecture or for an emulator.
Yes the problem is that instructions such as CMP, that are often at the end of a basic block, will leave around PF for later use in other basic blocks (which likely will never happen but you cannot know!).