|
|
|
|
|
by ahmedfromtunis
321 days ago
|
|
But what would happen if the small model's prediction was "is Rome."? Wouldn't that result in costlier inference if the small model is "wrong" more than it is correct. Also, if the small model would be sufficiently more "correct" than "wrong", wouldn't be more efficient to get rid of the large model at this point? |
|
The post training fine tuning costs (low thousand dollars) are the main reason why speculative decoding is relatively unpopular. The most effective speculative decoding strategy requires you to train multiple prediction heads ala medusa (or whatever succeeded it). If you don't do any fine tuning, then the probability of the small model being useful is slim. Using a random model as your draft model will probably give you very disappointing results.