| Cool, interesting links to code -- thank you! I chased down Intel's paper the code linked to describing how it works on archive.org. https://web.archive.org/web/20110317025924/https://software.... It's not just about using SIMD instructions (they help), and laying out memory to optimize cache performance (which also helps), but most importantly that Gaussian blur is a "separable filter" that you can break up into a horizontal and vertical pass, each of which require a lot fewer memory references (on the order of just two times the number of pixels times the kernel size, instead of the number of pixels times the kernel size squared): IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions >This white paper proposes an implementation for the Infinite Impulse Response (IIR) Gaussian blur filter [1] [2] [3] using Intel® Advanced Vector Extensions (Intel® AVX) instructions. [...] >The IIR Gaussian blur filter applies equation (1) on each pixel through two sequential passes:
The horizontal pass: This pass processes the input image left-to-right (row-wise), then right-to-left. The output of the left-to-right pass is added to the right-to-left pass. >The vertical pass: Usually, the vertical pass processes the output from the horizontal pass top-to-bottom (column-wise), and then bottom-to-top. Accessing the input column-wise leads to a lot of cache blocks and impacts the performance of the filter. To avoid this, the horizontal pass transposes the output before writing to the output buffer. It makes the vertical pass similar to the horizontal pass and processes the intermediate output left-to-right, then right-to-left. The vertical pass again transposes the final output before writing the blurred image. https://bartwronski.com/2020/02/03/separate-your-filters-svd... >Separate your filters! Separability, SVD and low-rank approximation of 2D image processing filters
Posted on February 3, 2020 by bartwronski >In this blog post, I explore concepts around separable convolutional image filters: how can we check if a 2D filter (like convolution, blur, sharpening, feature detector) is separable, and how to compute separable approximations to any arbitrary 2D filter represented in a numerical / matrix form. I’m not covering any genuinely new research, but think it’s a really cool, fun, visual, interesting, and very practical topic, while being mostly unknown in the computer graphics community. https://en.wikipedia.org/wiki/Gaussian_blur#Mathematics >In addition to being circularly symmetric, the Gaussian blur can be applied to a two-dimensional image as two independent one-dimensional calculations, and so is termed a separable filter. That is, the effect of applying the two-dimensional matrix can also be achieved by applying a series of single-dimensional Gaussian matrices in the horizontal direction, then repeating the process in the vertical direction. In computational terms, this is a useful property, since the calculation can be performed in O(w_kernel w_image h_image) + O(h_kernel w_image h_image) time (where h is height and w is width; see Big O notation), as opposed to O(w_kernel h_kernel w_image h_image) for a non-separable kernel. |