In the ARM instruction encoding, every arithmetic and logical instruction is "conditional". The destination register is either updated or not depending on the four bit condition field and the state of the condition flags in the processor.
As a simple contrived example, consider the following C code:
int a[100], b[100], count;
...
for (int i=0; i<100; i++) {
if (a[i] > b[i]) count++;
}
without conditional execution, one might compile this to code that uses a branch to either increment count or not; on ARM it would be more idiomatic to use conditional execution. Here's a very literal translation as an example (not tested, apologies for any inadvertent errors):
// setup: a in R0, b in R1, count in R2, i in R3.
loop: LDR R4, [R0, R3, LSL #2] // load a[i]
LDR R5, [R1, R3, LSL #2] // load b[i]
CMP R4, R5 // if a[i] > b[i]
ADDGT R2, R2, #1 // count++
ADD R3, R3, #1 // i++
CMP R3, #100 // if (i < 100)
BLT loop // continue loop
The fourth instruction, ADDGT, is conditional. Count is only updated with the result of the addition if the "greater than" condition is satisfied (the flags were set by the preceding instruction). To be more precise, all of the instructions here are conditional, it's just that for most of them the condition field is 1110, meaning "always".
Many instructions also have an "S" bit, which toggles whether or not they update the flags on which conditional execution depends. Taken together, these two features allow a clever assembly programmer to do some really clever things (but historically not too much effort has been directed at getting compilers to make really clever use of these features).
For low-power parts, this is a cute trick, as it allows a programmer to avoid stressing a limited branch predictor with lots of small branches. It does add some complication to the implementation however, especially when you get into designs that retire multiple instructions per cycle or support out-of-order execution, as conditional execution basically adds additional dependencies to every instruction.
I remember there was a "never" condition, which was present just for completeness; it turns out ARM eventually found that having 2^28 different NOPs would not be a good use of opcode space, so it's now a special extension for newer instructions...
I seem to recall from my ARM Assembler coding days that there was also a noop instruction, which of course could be conditional itself, so if you didn't actually want to do the NOOP, you could do NOOP-NE, which wouldn't do anything twice over.
From my days coding ARM assembly on the Acorn Archimedes, NOP was typically an alias for MOV R0,R0 (which effectively did nothing) rather than being its own instruction.
That's a beautiful explanation. It's one of my favorite things about the ARM instruction set.
That said, it also means debugging becomes a bit more painful. Let's say you want your (cheap) JTAG debugger to halt on the count++ instruction. You can hard break on that particular address in code, but you will always hit that address whether the condition was met or not.
This was part of the beauty of ARM when I learned it as a teenager back in the early 90s. Very simple and elegant, and writing ARM code by hand was enjoyable. Coming back to ARM now, though, in this form of Cortex-M microcontrollers, I see that things have become muddied with things like if-then-else instructions and mixed 16-bit/32-bit Thumb-2 code.
Simple, elegant, and it eats an astonishing 12.5% of instruction bandwidth (4 bits out of 32). A branch will require less space as soon as you want to conditionally execute over 8 instructions. On top of that, for that is executed unconditionally (in practice, maybe not most of the instructions if you look at the binary, but almost certainly most of the instructions if you count ones executed multiple times)
arm64 ... sort of ditches conditional execution. It’s not on every instruction any more, but it’s still available on more instructions than on most other arches.
To the usual complement of typical conditional instructions (branch, add/sub with carry, select and set), arm64 adds select with increment, negate, or inversion, the ability to conditionally set to -1 as well as +1, and the ability to conditionally compare and merge the flags in a fairly flexible manner (it’s really a conditional select of condition flags between the result of a comparison and an immediate). This actually preserves most of the power of conditional execution (except for really exotic hand-coded usages), while taking up much less encoding space.
Your last sentense made me realise why ARM64 no longer has conditional instructions. Obviously it's designed for higher power situations than most ARM cores, and OOO is an important part of doing that efficiently. reduced instruction deps == better OOO.
"Almost all ARM instructions can include an optional condition code. An instruction with a condition code is only executed if the condition code flags in the CPSR meet the specified condition. The condition codes that you can use are shown in Table 4.2."
f.e. execute this instruction only if the previous instruction resulted in a negative number
In ARM state, all instructions are conditionally executed according to the state of the
CPSR condition codes and the instruction’s condition field. This field (bits 31:28)
determines the circumstances under which an instruction is to be executed. If the state
of the C, N, Z and V flags fulfils the conditions encoded by the field, the instruction is
executed, otherwise it is ignored.
When condition is set the mnemonic of instruction is extended with one of suffixes like EQ, NE, CS, CC etc.
As a simple contrived example, consider the following C code:
without conditional execution, one might compile this to code that uses a branch to either increment count or not; on ARM it would be more idiomatic to use conditional execution. Here's a very literal translation as an example (not tested, apologies for any inadvertent errors): The fourth instruction, ADDGT, is conditional. Count is only updated with the result of the addition if the "greater than" condition is satisfied (the flags were set by the preceding instruction). To be more precise, all of the instructions here are conditional, it's just that for most of them the condition field is 1110, meaning "always".Many instructions also have an "S" bit, which toggles whether or not they update the flags on which conditional execution depends. Taken together, these two features allow a clever assembly programmer to do some really clever things (but historically not too much effort has been directed at getting compilers to make really clever use of these features).
For low-power parts, this is a cute trick, as it allows a programmer to avoid stressing a limited branch predictor with lots of small branches. It does add some complication to the implementation however, especially when you get into designs that retire multiple instructions per cycle or support out-of-order execution, as conditional execution basically adds additional dependencies to every instruction.