Hacker News new | ask | show | jobs
by QuestionPixel 2402 days ago
I'm currently studying EE and would like to get into embedded programming/semiconductor design, could you explain the optimal way to loop over the pixels if not with two nested for loops? Would it be something related to how in memory it's effectively a 1-D array?
4 comments

It's a really deep problem actually! Doing y then x is better but it's far from "optimal".

These parallel programming course notes have an example of using z-order curves which is slightly better yet: http://ppc.cs.aalto.fi/ch2/v7/

And the best solution requires something like Halide https://halide-lang.org/ to find the best traversal order.

The loops should be inverted. It should be for y, then for x, which accesses the memory in a linear fashion instead of in stride.

This presentation (PDF) explains why: https://www.aristeia.com/TalkNotes/ACCU2011_CPUCaches.pdf

At the very least, the inner loop should be over X, since that is how the pixels are packed.
It is precisely the linear ordering in memory. Whether X-then-Y or Y-then-X is best depends upon if the data is stored row- or column-major.