A dedicated GPU with more die area and its own large fan will run both cooler and quieter when its power limit is reduced to match the performance of the Apple chip.
The die area matters in terms of how much performance you can still get out of it when lowering the power to match up with the (not actually) more efficient integrated chip.
If Apple clocked their GPU to match the performance of a comparable dedicated chip, it would be just as inefficient, noisy and hot. Except they can not do even do that. They turned a limitation of the design into a supposed feature.
It doesn’t matter if the cooling solution gets the chops’s surface temperature lower if it’s heating the room twice as fast.
Chip surface temperature is not a useful metric for this purpose.