|
|
|
|
|
by pedrovhb
849 days ago
|
|
Close. Temperature is the coefficient of a term in a formula that adjusts how likely the system is to pick a next token (word/subword) which it thinks isn't as likely to happen next as the top choice. When temperature is 0, the effect is that it always just picks the most likely one. As temperature increases it "takes more chances" on tokens which it deems not as fitting. There's no takesies backies with autoregressive models though so once it picks a token it has to run with it to complete the rest of the text; if temperature is too high, you get tokens that derail the train of thought and as you increase it further, it just turns into nonsense (the probability of tokens which don't fit the context approximates the probability of tokens that do and you're essentially just picking at random). Other parameters like top p and top k affect which tokens are considered at all for sampling and can help control the runaway effect. For instance there's a higher chance of staying cohesive if you use a high temperature but consider only the 40 tokens which had the highest probability of appearing in the first place (top k=40). |
|
Doesn’t ChatGPT use beam search?