|
|
|
|
|
by manmal
418 days ago
|
|
A less known feature of LM Studio I really like is speculative decoding: https://lmstudio.ai/blog/lmstudio-v0.3.10 Basically you let a very small model speculate on the next few tokens, and the large model then blesses/rejects those predictions. Depending on how well the small model performs, you get massive speedups that way. The small model has to be as close to the big model as possible - I tried this with models from different vendors and it slowed generation down by x3 or so. So, you need to use a small Qwen 2.5 with a big Qwen 2.5, etc |
|