|
|
|
|
|
by spc476
3274 days ago
|
|
I recently played around with some code I wrote in college [1] which involves a pair of intertwined functions. I decided to play around with the SSE extensions on modern CPUs and write a vectorized version of the code. Try as I might, I could not beat GCC [2], which used non-vectorized code. I chalk it up to not knowing how best to write optimized x86 code anymore (it's been years since I did any real assembly language programming) and I might be hitting some scheduling or pipeline issues, I just don't know. [1] I described the code years ago here: http://boston.conman.org/2004/06/09.2 [2] I beat clang easily though. |
|