|
|
|
|
|
by flutetornado
472 days ago
|
|
My understanding is that top_k and top_p are two different methods of decoding tokens during inference. top_k=30 considers the top 30 tokens when selecting the next token to generate and top_p=0.95 considers the top 95 percentile. You should need to select only one. https://github.com/ollama/ollama/blob/main/docs/modelfile.md... Edit: Looks like both work together. "Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)" Not quite sure how this is implemented - maybe one is preferred over the other when there are enough interesting tokens! |
|
Both can be chained together and some inference engines let you change the order of the token filtering, so you can do p before k, etc. (among all other sampling parameters, like repetition penalty, removing top token, DRY, etc.) each filtering step readjusts the probabilities so they always sum to 1.