|
|
|
|
|
by coolfox
268 days ago
|
|
> A lack of determinism comes from many places, but primarily: 1) The models change 2) The models are not deterministic... models themselves are deterministic, this is a huge pet peeve of mine, so excuse the tangent, but the appearance of nondeterminism comes from a few sources, but imho can be largely attributed to the probabilistic methods used to get appropriate context and enable timely responses. here's an example of what I mean, a 52-card deck. The deck order is fixed once you shuffle it. Drawing "at random" is a probabilistic procedure on top of that fixed state. We do not call the deck probabilistic. We call the draw probabilistic. Another exmaple, a pot of water heating on a stove. Its temperature follows deterministic physics. A cheap thermometer adds noisy, random error to each reading. We do not call the water probabilistic. We call the measurement probabilistic. Theoretical physicists run into such problems, albeit far more complicated, and the concept for how they deal with them is called ergodicity. The models at the root of LLM's do exhibit ergodic behavior; the time average and the ensemble average of an observable are identical, i.e. the average response of a single model over a long duration and the average of many similar models at a fixed moment are equivalent. |
|
They are including the random sampler at the end of the LLM that chooses the next token. You are talking about up to, but not including, that point. But that just gives you a list of possible output tokens with values ("probabilities"), not a single choice. You can always just choose the best one, or you could add some randomness that does a weighted sample of the next token based on those values. From the user's perspective, that final sampling step is part of the overall black box that is running to give an output, and it's fair to define "the model" to include that final random step.