No, winning 4 and losing 6, by a small margin, isn't "being worse than arm". The paper's authors even explicitly conclude it is not losing to ARM.
This is even ignoring whether code is within or outside loops, counting fuseable instructions as always non-fused, and not considering any instructions from extensions after 2019's ratified (actually unchanged from 2017) rv64g... any of those would have a favorable effect on RISC-V.
This is an excellent result for RISC-V, that clears any doubts in terms of path length. On top of what we already know about RISC-V leading in code density in 64bit.
Might not be "worse" (I'd definitely agree that the difference is plenty small enough to be considered equal within error bounds), but is certainly not something worthy of RISC-V being noted as doing "great" either.
Excluding extensions is perhaps a significant question, but, for example, Debian RISC-V currently targets rv64gc, which should have the same instruction counts as rv64g does, so software compiled for Debian can't use the later extensions for most code anyway. (never mind that ARMv8 also has excluded extensions, namely NEON, which is always present on ARMv8 and is not designed to be ignored)
And, of course, even being better than ARM is not equivalent to being the best it could be; ARMv8 isn't some attempt at a magical optimal instruction set, it's designed for whatever ARM needed, and that includes being able to efficiently share hardware with ARMv7 for backwards compatibility.
ARM of course would also benefit from fusion too; but camel-cdr's mention of it being only rv64g is a pretty significant caveat.