|
|
|
|
|
by janwas
90 days ago
|
|
(Personal opinion)
I get the impression that RISC-V-related discussions often lack of awareness of prior work/alternatives. A large amount of (x86) software actually uses our Highway library to run on whatever size vectors and instructions the CPU offers. This works quite well in practice. As to leaving performance on the table, it seems RVV has some egregious performance differences/cliffs. For example, should we use vrgather (with what LMUL), or interesting workarounds such as widening+slide1, to implement a basic operation such as interleaving two vectors? |
|
Use Zvzip, in the mean time:
zip: vwmaccu.vx(vwaddu.vv(a, b), -1, b), or segmented load/store when you are touching memory anyways
unzip: vsnrl
trn1/trn2: masked vslide1up/vslide1down with even/odd mask
The only thing base RVV does bad in those is register to register zip, which takes twice as many instructions as other ISAs. Zvzip gives you dedicated instructions of the above.