|
|
|
|
|
by maxwell86
1432 days ago
|
|
If you only write to the lower 32-bit of the v0 register, which could be 1024 bit wide, that claims that the hardware somehow has to allocate a 1024-bit wide register to back those up, and then makes some "locality" arguments. The hardware can back up the 1024-bit register with a pool of 32-bit registers, and if you only wrote to the first 32-bits, and all others are zero, it can use a single 32-bit register to back it up, making this "as good" as the single mask register solution, which the author thinks is good. |
|