|
|
|
|
|
by mgraczyk
364 days ago
|
|
But that view is wrong, the model outputs multiple tokens. The right alternative view is that it's an immutable function from prefixes to a distribution over all possible sequences of tokens less than (context_len - prefix_len). There are no mutable functions that cannot be viewed as immutable in a similar way. Human brains are an immutable function from input sense-data to the combination (brain adaptation, output actions). Here "brain adaptation" doing a lot of work, but so would be "1e18 output tokens". There is much more information contained within the latter |
|