Hacker News new | ask | show | jobs
by vardump 4072 days ago
I think CAS is a pretty slow operation even without a LOCK prefix. You probably don't want to use it for purposes other than intercore synchronization.

If you have a lot of data to process, using SSE/AVX is a huge win. Conditional masking and min/max instructions for example.

SIMD is a huge win especially in sorting, you can have 10-40x speed-up by using a bitonic sorting network.