| More generally, “stop storing state, unless it makes the program less complicated.” The first version is simple. The second version is more complicated. The simplicity is because the first version can be represented as a formula; imagine trying to write out the second version as a formula, and the complexity becomes obvious. (The addition loop would have to be written as a recurrence relation, which of course means every step depends on the previous step.) Complexity isn’t always obvious. It’s sometimes common in C programs to write a loop like dst[index++] = src[i], particularly in deeply nested for-loops. In my experience it’s almost always worth rewriting it so that the indices are computable entirely from the loop iteration variables, with no state. It helps you build a mental map of the operations, because you can think of it in terms of geometry (memcpy = copying rectangles onto other larger rectangles) whereas when the index is stateful it becomes quite a lot harder to visualize in higher dimensions. At least for me. We’ve been building a compiler at Groq for our custom ML hardware. (Roughly, “turn pytorch into our ISA with no extra programmer effort.”) I used to think of posts like this as “Well, modern compilers are so fast, who really cares about autovectorization?” — it turns out you care when you need to write your own compiler. :) MLIR is pretty cool. It makes a lot of transformations like this pretty easy. The MLIR “krnl” dialect can also automatically transform nested loops into tiled iteration. Graphics devs will know what I mean — no need to loop over 8x8 blocks of pixels manually, just write “for x in range(width): for y in range(height): …” and set the block size to 8. |
I had hoped that functional programming languages would lead us to automatic high-level parallelization. That hasn't happened much in the real world. But what we got is "write code like a functional programmer would even in a low-level language and the compiler and the ISA will parallelize it for you." That surprises me, but I'll take it.