Hacker News new | ask | show | jobs
by maxwell86 1432 days ago
This is not really a problem.

While it is true that you can only use masks from v0, and this requires moving masks into v0 after calling a vector instruction, those moves don't actually copy data from one register to another. Instead, they just "rename" registers.

So...

    ...generate mask into v2... v2, ....  <- put mask here
    
    mov v0, v2  <- move mask into v0
    vadd ... <- vector instruction, always use v0
doesn't really put some bits into v2, then copy them to v0, and then call the vector instruction.

Instead, the mov v0, v2 just disappears due to a register rename (e.g. v2 gets renamed as v0 for vadd), and vadd picks the mask directly from the register that was previously called v2 but is now called v0.

Any CPU would implement register renaming before actually even thinking of adding vector registers. So it is fair to assume that every CPU that implemenst the RISC-V V extension, supports it.

2 comments

The article says as much.

> Although this may seem like a significant drawback, it's actually not that alarming, at least not to me. While it's true that you must insert various "move mask to v0" instructions that wouldn't otherwise be there, it's important to remember that these will not really be actual computation instructions. Moves from one vector register to another will always be simple register renames handled by the front end of any high-performance chip, and I would consider it highly unlikely that you would change masks so frequently as to overburden the front end.

This is not the point the author was making.

If you only write to the lower 32-bit of the v0 register, which could be 1024 bit wide, that claims that the hardware somehow has to allocate a 1024-bit wide register to back those up, and then makes some "locality" arguments.

The hardware can back up the 1024-bit register with a pool of 32-bit registers, and if you only wrote to the first 32-bits, and all others are zero, it can use a single 32-bit register to back it up, making this "as good" as the single mask register solution, which the author thinks is good.

Determining that you only wrote to the bottom 32 bits of the register being copied from is hard for hardware to see; and if the compiler can see, it has no way to tell the hardware.
I know of a lot of embedded designs which do not do register renaming.
I doubt any of those are implementing vector instructions. If they do, then renaming is a relatively small addition in comparison.
Vector instructions are replaced with DSP instructions for embedded applications. Ne speculating SOC with DSPs seem pretty standard?