Hacker News new | ask | show | jobs
by ptfoobar 4377 days ago
Second example say you want to compute

(x + y) * z - w

where each of them are long vectors but want to keep the look of the code readable and explicit. Naive way to do it will create lots of temporaries and unnecessary loops. Every binary operation is going to create a temporary (and entail a loop). But if you have laziness you just need one loop.

Can you elaborate on this? Why can laziness provide better code here?

2 comments

If each of the four variables is a vector, naively you might do (x + y) first, producing a new vector. Then you would multiply by z, producing a second new vector. Finally, subtract w, producing a third new vector. You have now iterated over the length of the vectors three times, and allocated three new vectors (two of which are no longer needed).

A better way to do all that would be to allocate a single result vector and populate it with the full computed expression for each element. This can be much faster for large vectors. Python's numexpr (among others) is designed to do just this.

1. loop: x + y, loop (x + y) * z, loop (x + y) * z - w.

2. loop: (x[i,...,n] + y[i,...,n]) * z[i,...,n] - w[i,...,n]

although I don't think the loops are going to be the main overhead in this case.