The power consumption of an individual core can vary greatly depending on its clock speed (and voltage, which can be reduced when running at a lower clock speed). Putting 64 cores into the same TDP as their 32-core chip is actually pretty easy. If you actively use all 64 cores, they'll be running at a lower clock speed than if you were only using 32 cores.