|
|
|
|
|
by CoolGuySteve
2729 days ago
|
|
Last time I used SSE intrinsics, which was GCC 4.9 I think, I had a lot of trouble with register usage. It looked like it was compiling down to use only one SSE register for everything instead of parralelizing across them. I tried the same algorithm in godbolt with some clang versions and it was slightly better, using two or three registers, but not by much. So I had to break it into inline assembly. I wonder if GCC has improved since then. |
|
Yeah, that's a common problem and leads to nasty dependency stalls. MSVC is horrible in the same way, at least 2015. Haven't tried newer versions yet. Intel's ICC seems to generate good code most of the time.