Hacker News new | ask | show | jobs
by spc476 3274 days ago
I recently played around with some code I wrote in college [1] which involves a pair of intertwined functions. I decided to play around with the SSE extensions on modern CPUs and write a vectorized version of the code.

Try as I might, I could not beat GCC [2], which used non-vectorized code. I chalk it up to not knowing how best to write optimized x86 code anymore (it's been years since I did any real assembly language programming) and I might be hitting some scheduling or pipeline issues, I just don't know.

[1] I described the code years ago here: http://boston.conman.org/2004/06/09.2

[2] I beat clang easily though.