Hacker News new | ask | show | jobs
by diamondlovesyou 1555 days ago
This article focuses on loop vectorization, but another important area in this realm is vectorizing "straight-line code", sometimes called SLP vectorization. This sort of code often lives in loops too, and so usually also depends on much of the memory dependence analysis used for vectorizing successive iterations, but only uses it for finding memory accesses which are sequential within a single iteration. This is important for vectorizing loop reductions (ie summing each channel in every pixel in an image), or eg a 3d cross-product. The techniques are built around matching expression trees or DAGs together, but gets complicated because brute forcing the search isn't practical in even medium sized functions, among other reasons.

Source: I work for AMD on MSVC, currently focused on renovating it's SLP pass. I'm about to merge patch into MSVC which boosts 538.imagick_r on Zen3 by 25%, which I'm pretty proud of.

1 comments

Do you know what would be the most promising next gen autovectorization optimizations? Maybe polyhedral ? https://polly.llvm.org/ https://en.wikipedia.org/wiki/Loop_optimization#The_polyhedr...
Beware that the Polly matmul example apparently pattern-matches and expands the loop structure to the Goto-like form rather than deriving it a priori. (So I was told when raising those results -- I haven't checked the code.)

GCC also has the Pluto-style optimization (-floop-nest-optimize) but I've never had it working successfully.