| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by _neil 454 days ago
	A draft model is something that you would explicitly enable. It uses a smaller model to speculatively generate next tokens, in theory speeding up generation. Here’s the LM Studio docs on it: https://lmstudio.ai/docs/app/advanced/speculative-decoding