Hacker News new | ask | show | jobs
by nosuchthing 115 days ago
LLMs can't access the training data that's less than the statistically most common token, so they use a random jitter.

With that randomness comes statistically irrelevant results.