Y
Hacker News
new
|
ask
|
show
|
jobs
by
_neil
406 days ago
A draft model is something that you would explicitly enable. It uses a smaller model to speculatively generate next tokens, in theory speeding up generation.
Here’s the LM Studio docs on it:
https://lmstudio.ai/docs/app/advanced/speculative-decoding