| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by strictfp 4120 days ago

Nice article. It inspired me to look around for some more straightforward way of optimizing, and I found the setcc class of instructions: http://www.nynaeve.net/?p=178

I'm thinking that this combined with some CAS (CMPXCHG8B) could acheive the same, right?

Something like (pseudo):

Comparewith(4)

Ifequalstore(54)

Ifnotequalstore(2)

Return

2 comments

vardump 4120 days ago

I think CAS is a pretty slow operation even without a LOCK prefix. You probably don't want to use it for purposes other than intercore synchronization.

If you have a lot of data to process, using SSE/AVX is a huge win. Conditional masking and min/max instructions for example.

SIMD is a huge win especially in sorting, you can have 10-40x speed-up by using a bitonic sorting network.

link

1_player 4120 days ago

Aren't setcc/cmov* instructions effectively similar to a branch? To compute the result you need to execute the previous instruction.

I suppose that these instructions do not cause the instruction pipeline to be flushed, compared to an incorrectly predicted jump, but they still stall until the previous instruction has been executed.

jmp < setcc/cmov* < branchless conditionals

link

Scaevolus 4120 days ago

Conditional moves have data dependencies on their input arguments, but so do the "branchless" versions presented in the article.

link