What's the "nice to have feature" of vl=0 not modifying registers? I can't see any benefit from it. If anything, it's worse, due to the problems on reduce and vmv.s.x.
"nice to hace" because it removes the need for a branch for the n=0 case, for regular loops you probably still want it, but there are siturations were not needing to worry about vl=0 corrupting your data is somewhat nice.
Huh, in what situation would vl=0 clobbering registers be undesirable while on vl≥1 it's fine?
If hardware will be predicting vl, I'd imagine that would break down anyway. Potentially catastrophically so if hardware always chooses to predict vl=0 doesn't happen.