|
|
|
|
|
by xiphias2
64 days ago
|
|
> Temperature / top-k sampling in verify. Currently greedy-only This is interesting, doesn't greedy-only decoding slow down speculative decoding significantly? In theory the probability of needing resampling (rejection) is (p_real-p_sample)+, which should be much smaller with non-greedy distribution |
|