CPUs are starting to move toward 2.5D and 3D packaging, but for the foreseeable future thermal limitations will prevent a fully 3D processor from being viable. Transistors per unit volume doesn't really have advantages over transistors per unit area when we're at most looking at stacking some cache on top of the CPU logic (and in the near future, we're really just putting chiplets alongside each other and providing interconnections with similar performance to communication across a large monolithic die).
You're totally right that heat is the enemy to 3D designs. They will only work with low power, or if nanoscale heat pipes can be integrated into the design.
This should be upvoted to the top of the discussion. The article never makes a suggestion for what should replace nanometers as the node name. This is a great suggestion: million transisters per square millimeter.
Different applications would probably use different scores, based on a table of numbers.
+++
Count of transistors within a single layer (E.G. what's linked above)
Count of transistors within a 3d volume
Performance (flops? some other standard?) of that volume under various thermal conditions, power constraints, etc.
Latency at (the pins and) edges during the common test pattern (during the above profile slots).
Efficiency: how much power is consumed in the above states.
Weight; sometimes it matters, this should be measured and published even if it isn't used in the score.