The original comment by brandmeyer said "ASIC", not "GPU".
Take the same RTL. Synthesize it for ASIC and for FPGA. Observe a 20x difference after normalizing for power, area, and clock speed.