I assume it's doing `(N-2)(N-3)/2 + 2N - 3` instead of `N(N-1)/2` due to overflow concerns? But couldn't `(N-2)(N-3)` also possibly overflow, just supporting a larger range of `N`?
In this assembly code it cannot overflow because N is a 32-bit integer and the multiplication gives a 64-bit result, which is converted to 32-bit only after shifting.
I can't figure out why it doesn't use the simpler formula (other than the optimizer being bad).
I can't figure out why it doesn't use the simpler formula (other than the optimizer being bad).