XOR can do everything in 1 cycle (which is hopefully far, far less than the clock). SUB-if done the simple way-has to take n cycles where n is the number of bits subtracted.
That's just not true. You think subtracting/adding 64-bit numbers actually take 64 cycles?
There is sequential implementation of ripple carry adder that uses clock and register, this will add 1-bit per cycle, but no body uses this for obvious reason, it's just a toy example for education. A normal ripple carry adder will have some delay in propagation time before the output is valid, but that is much less a clock cycle. You can also design a customized adder circuit for 4-bit 8-bit 16-bit etc separately that would greatly minimizes the propagation delay to only 2 or 3 levels of gates, instead of n gates like in the ripple carry adder.
Right. In other words, the clock cycle is already made to be long enough to allow a word-sized SUB to settle. An XOR-with-self surely settles faster, but it still has to wait for that same clock cycle before proceeding.
> but no body uses this for obvious reason, it's just a toy example for education.
SERV has entered the chat!
It has one upside besides education, and that is that it can be implemented with fewer gates. If you for some reason need parallelism on the core level rather than the bit level, you can cram in more cores with bit-serial ALUs in the same space.
There is sequential implementation of ripple carry adder that uses clock and register, this will add 1-bit per cycle, but no body uses this for obvious reason, it's just a toy example for education. A normal ripple carry adder will have some delay in propagation time before the output is valid, but that is much less a clock cycle. You can also design a customized adder circuit for 4-bit 8-bit 16-bit etc separately that would greatly minimizes the propagation delay to only 2 or 3 levels of gates, instead of n gates like in the ripple carry adder.