Hacker News new | ask | show | jobs
by pedrovhb 849 days ago
Close. Temperature is the coefficient of a term in a formula that adjusts how likely the system is to pick a next token (word/subword) which it thinks isn't as likely to happen next as the top choice.

When temperature is 0, the effect is that it always just picks the most likely one. As temperature increases it "takes more chances" on tokens which it deems not as fitting. There's no takesies backies with autoregressive models though so once it picks a token it has to run with it to complete the rest of the text; if temperature is too high, you get tokens that derail the train of thought and as you increase it further, it just turns into nonsense (the probability of tokens which don't fit the context approximates the probability of tokens that do and you're essentially just picking at random).

Other parameters like top p and top k affect which tokens are considered at all for sampling and can help control the runaway effect. For instance there's a higher chance of staying cohesive if you use a high temperature but consider only the 40 tokens which had the highest probability of appearing in the first place (top k=40).

1 comments

> There's no takesies backies with autoregressive models

Doesn’t ChatGPT use beam search?

Almost certainly not.

It's absolutely just sampling with temperature or top_p/k, etc. Beam searches would be very expensive, I can't see them doing that for chatgpt which appears to be their "consumer product" and often has lower quality results compared to the api.

The old legacy had a "best_of" option but that doesn't exist in the new api.