|
|
|
|
|
by dzaima
677 days ago
|
|
> Packed SIMD doesn't even have LMUL>1, so the comparison here is that you are usually the same as Packed SIMD, but can sometimes be better which isn't a terrible place to be. Packed SIMD not having LMUL means that hardware can't rely on it being used for high performance; whereas some of the theadvector hardware (which could equally apply to rvv1.0) already had VLEN=128 with 256-bit ALUs, thus having LMUL=2 have twice the throughput of LMUL=1. And even above LMUL=2 various benchmarks have shown improvements. Having a compiler output multiple versions is an interesting idea. Pretty sure it won't happen though; it'd be a rather difficult political mess of more and more "please add special-casing of my hardware", and would have the problem of it ceasing to reasonably function on hardware released after being compiled (unless like glibc or something gets some standard set of hardware performance properties that can be updated independently of precompiled software, which'd be extra hard to get through). Also P-cores vs E-cores would add an extra layer of mess. There might be some simpler version of just going by VLEN, which is always constant, but I don't see much use in that really. |
|