Hacker News new | ask | show | jobs
by bobmcnamara 773 days ago
IIRC, they have 128bit alignment requirements, so tricky to autovectorize.
1 comments

True - load and store mask off the bottom 4 bits of the address. They try to help the situation by including an instruction which can shift a pair of 128-bit registers by bytes.
That sounds really familiar. Maybe Altivec did that? I remember it did something like that but I wish that it would just fault.