| HN Mirror

Temperature has nothing to do with internals. Temperature is purely to do with how the logits outputted by the network are transformed into probabilities, which is completely deterministic and not learned. In fact, temperature makes it impossible for LLMs to simulate this kind of probability. As a calibrated 80-20 split at a certain low temperature would be a different split with some other temperature.