What's wrong with doing it in two passes? N iterations each doing 2 operations is exactly the same cost as 2*N iterations each doing 1 operation. Because multiplication is commutative.
This is true, but a big advantage of the single pass method is data reuse. Instead of loading each element twice, you load each element once. Per bit, reading/writing to external memory is massively more energy intensive than any compute op. Orders of magnitude more.
I think you meant 2N+2M vs 2N+M, but yes, that’s the point: not reading twice is cheaper because unlike in traditional big-O analysis compute is cheap but data access is very expensive.