Hacker News new | ask | show | jobs
by jgammell 80 days ago
When sampling from an LLM people normally truncate the token probability distribution so that low-probability tokens are never sampled. So the model shouldn't produce really weird outputs even if they technically have nonzero probability in the pre/post training data.