|
|
|
|
|
by richardwhiuk
4104 days ago
|
|
Of course, the next step is obvious - work out why the compiler didn't do a four way avx unroll, and then submit a bug fix to clang to make it do that. That way all of your future code benefits from your single micro-optimization. It's also possible that you find out that if you enable --generate-for-haswell or some other arcane compiler flag, it'll do it for you. |
|