Hacker News new | ask | show | jobs
by xiphias2 64 days ago
> Temperature / top-k sampling in verify. Currently greedy-only

This is interesting, doesn't greedy-only decoding slow down speculative decoding significantly?

In theory the probability of needing resampling (rejection) is (p_real-p_sample)+, which should be much smaller with non-greedy distribution