|
|
|
|
|
by dvdkon
14 days ago
|
|
As far as I know, speculative decoding still verifies that the proposed tokens are what the "big" model would generate, it just uses the guesses to make that process faster. Setting the probability threshold too low then shouldn't affect correctness, just speed (time will be wasted verifying bad guesses). |
|