|
|
|
|
|
by dsharlet
1014 days ago
|
|
> The only case I can currently fore see where using LMUL=1 and manually unrolling instead will likely be always beneficial is vrgather operations that don't need to cross between registers in a register group (e.g. byte swapping). What about algorithms where register pressure is an issue? I think the problem with LMUL is it assumes that you always want to unroll the innermost dimension (where the vector loads are stride 1). That's usually, the last dimension I try to unroll, if there are any registers left over. If there is any sharing of data across any other dimension in the algorithm, it's better to tile/unroll those first. Of course, for a simple algorithm, there will be registers left over. But I think more interesting algorithms will struggle on RVV if you must use LMUL > 1 for performance. |
|