|
SIMD lines are just a miniaturization of older vector processors as co-processors, a-la CRAY in a box. As an HPC sysadmin and scientific software developer/researcher, I can confidently say that SIMD can provide real performance gains, however there are trade-offs and decisions to be made. - First of all, SIMD is very data-hungry. You either need to constantly push data into it, or modify the data you've pushed a lot. Otherwise you just sit. - Then there comes power and frequency penalty. In Intel's case, it needs humongous amount of power in CPU budget terms, and it creates heat and slowdowns. So you have to test your code with SIMD or without it (-mtune, -march, etc.). If your code is as speedy or faster, use SIMD. - Moreover, you can't just compile an extremely optimized binary and fan it out. Older processors will just throw "illegal instruction" and halt. You either will provide multiple binaries with specific optimizations for each, or lowest common denominator for a vendor (AMD binary and Intel binary), or just throw all out. The best way is giving the source out and providing a simple makefile to let the researcher/user compile it, but not all code is open, one may guess. Creating a universal with multiple code paths is also possible, yet needs a lot of elbow grease, and may not be always optimal. - Lastly, your code don't have to be embarrassingly parallel to be able to use SIMD. Matrix/linear algebra libraries like Eigen can almost abuse the processor's all units when compiled with correct flags (-O3, -mtune=native, -march=native). However, if you want to accelerate small data with SIMD, you need to create a parallel loop which needs to saturate SIMD pipelines. Which OpenMP can easily do with parallel_for. All of this doesn't change that SIMD is a special horse which can't run in all courses, however its not useless. |
I think you're missing the point of my post, I agree with all your points in specificity (except one, but not the forum to discuss FMV in modern compilers) but they miss the grander point that Intel hasn't made computers faster via more SIMD. The amount of expertise required to make use of it is just more evidence of that.