|
|
|
|
|
by miven
764 days ago
|
|
The authors mention that Jacobi decoding is equivalent to greedy autoregressive decoding, but in practice don't we often want the sampling temperature to be above zero to avoid repetitions and excessively generic responses? I'm completely unfamiliar with this decoding strategy so maybe I'm just missing a simple way to account for that. |
|