|
|
|
|
|
by aleph_minus_one
64 days ago
|
|
> And on x86, saturating addition can't be done in a tick Perhaps I misunderstand your point, but I am rather sure that in SSE.../AVX... there do exist instructions for saturating addition: * (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW * (V)PHADDSW, (V)PHSUBSW |
|
(...though, x86 does have (v)pmulhw for 16-bit input, so for 16-bit div-by-const the saturating option works out quite well.)
(And, for what it's worth, the lack of 8-bit multiplies on x86 means that the OP method of high-half-of-4x-width-multiply works out nicely for vectorizing dividing 8-bit ints too)