| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Imnimo 784 days ago

Another interesting experiment on this front:

https://twitter.com/infobeautiful/status/1778059112250589561

One thing I would have liked to see in the blog post is some attention to temperature. It looks like they're calling ChatGPT through LangChain - what is the default temperature? If LangChain is choosing a low temperature by default, we shouldn't be surprised if we get an incorrect distribution even if ChatGPT were perfectly calibrated! My guess is that even at temperature 1, this result will roughly hold, but we should be careful not to fool ourselves.

If we take the result at face value, though, it's interesting to note that GPT-4's technical report showed that the chat model (the one with the RLHF and what not) had flatter-than-correct calibration on its logprobs. But here we're seeing sharper-than-correct. What explains the difference?