Performance per watt isn’t so useful for a GPU. People training ML algorithms would gladly increase power consumption if they could train larger models or train models faster.
And that's exactly my point: they can't. Power does not solve contention and latency! It's over, permanently... (or atleast until some photon/quantum alternative, which honestly we don't have the energy to imagine, let alone manufacture, anymore)