def sumitup(n):
total = 0
for i in range(n):
total = total + i
return total
It was optimized quite well, but still had loops. I know this is a lot to ask, but I would have expected it to be possible to specialize it to a loopless variant:
def sumitup(n):
if n < 0:
return 0
else:
return n*(n-1) // 2
They have an example where this code is compiled:
It was optimized quite well, but still had loops. I know this is a lot to ask, but I would have expected it to be possible to specialize it to a loopless variant: Clang 4+ actually finds this optimization, but gcc and icc doesn't seem to: https://godbolt.org/g/v4zhrmThat is, the assembly generated by clang seems to be equivalent to
which might even run faster on the CPU, due to the speficic code emitted.