|
|
|
|
|
by pcordes
3226 days ago
|
|
This is almost exactly the same as dmitryg's answer, using a state-machine the same way. It would compile to nearly the same code, but with an extra instruction to sign-extend the first char into a register before adding. You did remove a level of indirection for the format-strings, though. You could have done that with struct state {
int next;
char fmt[6];
};
Anyway, this has probably a 5 cycle loop-carried dependency chain on Skylake, from x += *x; compiling into a 4-cycle latency movsx rax, byte [rdi], then a 1-cycle add rdi, rax. (Or whatever registers the compiler picks).If you'd stored pointers, you could have got it down to 4 cycles on Skylake for the load-use latency of a simple addressing mode ([reg + disp] where displacement is 0..2047). |
|