|
|
|
|
|
by nodja
471 days ago
|
|
They both work on a sorted list of tokens by probability. top_k selects a fixed amount of tokens, top_p selects the top tokens until the sum of probabilities passes the threshold p. So for example if the top 2 tokens have a .5 and .4 probability, then a 0.9 top_p would stop selecting there. Both can be chained together and some inference engines let you change the order of the token filtering, so you can do p before k, etc. (among all other sampling parameters, like repetition penalty, removing top token, DRY, etc.) each filtering step readjusts the probabilities so they always sum to 1. |
|