| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jgammell 80 days ago
	When sampling from an LLM people normally truncate the token probability distribution so that low-probability tokens are never sampled. So the model shouldn't produce really weird outputs even if they technically have nonzero probability in the pre/post training data.