Hacker News new | ask | show | jobs
by gww 1481 days ago
I am always amazed at the algorithms re-implemented using SIMD. One of my favorites is the Striped Smith-Waterman approach used for sequence alignment. Does anyone have any good resources on learning to use SIMD? I've found it heard to make the "plunge".
5 comments

The lectures and assignments for Oregon State's CS475 (Parallel Programming) are all available online. [0] There are lectures [1] and a project [2] about SIMD. I really enjoyed the entire course as a survey of parallel and high-performance computing. The full course covers multi-processing, multi-threading, caching, SIMD, GPUs (CUDA and OpenCL) and MPI. The projects are in C/C++ (along with OpenMP, CUDA and OpenCL). FYI, I think the last two projects use some large research GPU bank that you have to have special access to use, so you'd be out of luck on implementing the projects for those.

[0] https://web.engr.oregonstate.edu/~mjb/cs575/ [1] https://media.oregonstate.edu/media/t/1_7wju0jtq [2] https://web.engr.oregonstate.edu/~mjb/cs575/Projects/proj04....

Thanks these are really great. That's a course I wish my CS program had.
:) How about Chapter 12 of Agner's awesome manual (https://agner.org/optimize/optimizing_cpp.pdf), http://const.me/articles/simd/simd.pdf, https://en.algorithmica.org/hpc/simd/ ?

Does anyone have others to share?

These are really helpful too. Another reply to my question suggests a course that looks really good too.
I wrote that article some time ago: http://const.me/articles/simd/simd.pdf
This is helpful. Thank you
The performance benefit of SIMD implementation of Smith-Waterman-Gotoh is moot relative to the gains provided by a total reformulation of the pairwise sequence alignment problem.

The Wavefront Algorithm (WFA) flips the problem on its head by progressively exploring the best scoring alignment until a global alignment is attained. Then no more work needs to be done to fill the matrix. The total work is actually quadratic in sequence divergence rather than length, a huge improvement over SWG for almost all applications.

In WFA the data dependencies are trivial and compilers easily auto-vectorize the inner loop of the algorithm. It's also possible to implement this in linear memory relative to sequence divergence with a bidirectional approach (biWFA).

All this is to say that vectorization and SIMD hardware is cool, but new theory and approach can completely overwhelm it's potential benefits.

I agree with you about WFA. I was just very impressed when the striped SW first came out when I was still a fledgling PhD student.
Agner Fog's VCL library is extremely approachable. Just download it and get stuck in.

  Manual
  https://www.agner.org/optimize/vcl_manual.pdf

  GitHub
  https://github.com/vectorclass/version2
Even for trivial stuff like linebreaking input files, massive speedups are there for the taking.
Thank you