|
|
|
|
|
by chessgecko
810 days ago
|
|
There’s some fancier stuff too like techniques that take into account where recent tokens were drawn from in the distribution and update either the top_p or the temperature so that sequences of tokens have a minimum unlikeliness. Beam search is less common with really large models because the computation is really expensive. |
|