| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cnvogel 4707 days ago

Interestingly, while x86-64 does not seem to have a single opcode for reversing bits in a byte, it has a function to arbitrarily shuffle around the 16 bytes in a 128bit SSE register [PSHUFB]. It just blows my mind how much data those SIMD instructions process or move around in relatively few clock-cycles.

http://stackoverflow.com/a/9040426

http://www.intel.com/content/www/us/en/processors/architectu... (it's on page 1256 of 3251).

1 comments

stephencanon 4707 days ago

It’s actually shocking how long it took Intel to add PSHUFB to SSE. Altivec (PPC) had the even-more-powerful vperm (arbitrary shuffle mapping 32B to 16B) way back in 1999.

link

chacham15 4707 days ago

The VAX (circa 1977) had polynomial evaluation as an instruction[1]. What is your point?

[1] http://en.wikipedia.org/wiki/VAX

link

stephencanon 4707 days ago

Like my sibling posted, the crazy CISCy instructions aren’t comparable because in general they were no faster than an equivalent sequence of simpler instructions. That’s not the case for permute; there are no “simpler” instructions that let you build an efficient permute. It’s one the fundamental building blocks for efficient vector code -- that’s why it’s shocking that it was added to SSE so late.

link

nhaehnle 4707 days ago

The point is that bit twiddling can be much more efficient to implement in hardware because all you're doing is placing wires somewhere. The RBIT instruction in the article significantly speeds up an operation at very low hardware cost.

Polynomial evaluation does not fit into this pattern, because you need actual arithmetic operations to do it, and so a hardware polynomial evaluation instruction has no significant benefit over the corresponding sequence of explicit multiplications and additions.

link