Hacker News new | ask | show | jobs
by AliceHe2003 355 days ago
This summer, I set out to write a web-based image editor in C++ using WASM. (You can read more about it in my last post!)

One of the biggest performance bottlenecks? Merging layers.

The intuitive algorithm: process the final image layer by layer, iterating through each pixel position (x,y).

My friend proposed an alternative algorithm, that in theory sounded much better…

A tempting alternative (that didn’t work): process the final image pixel by pixel, evaluating all layers at once.

This approach initially seemed like a great idea - in the worst case, both of these algorithms process every single pixel in every single layer. In the average case, the second algorithm can skip over entire regions of pixels on lower layers.

But in practice, the performance was worse. When I was iterating layer by layer, there was no significant processing time that was noticeable by the human eye. Using the pixel by pixel algorithm, the image took more than 30 seconds to appear. Same worst-case time complexity, but way worse performance in practice.

Why? CPU cache misses.

The first method processes entire layers at a time. This means that it uses sequential memory access, leading to more cache hits, and so an overall better runtime.

The second method jumps between layers for each pixel. This causes random memory access, creating more cache misses, and so a significant delay to access data.

What I learned In theory, both layer-wise and pixel-wise merging have the same worst-case complexity. But in practice, memory access patterns make all the difference. By aligning your algorithm with how the CPU cache works—favouring sequential over random access—you can unlock massive speedups. Sometimes, the "intuitive" way just fits the hardware better.

https://github.com/alicehe2003/ImageEditor