|
|
|
|
|
by sailingparrot
255 days ago
|
|
Not sure exactly what setup you are running, in theory yes, higher temperature for both model means higher chance of overlap and thus less rejections -> faster sampling (but worse quality overall). However, if you have higher temperature but still are operating under a top-k sampling where k is small, not sure it's going to translate to any noticeable difference, since this will make your actual distributions very much non-uniform. |
|
I didn't set a top-k. So it seems like Together must be doing something weird in their speculative decoding implementation.