Hacker News new | ask | show | jobs
by mirekrusin 1092 days ago
Tokens are integers that map to text tokens.

Tokens are part of words, approx 4 characters or 75% of word.

It gives a list of tokens with their probabilities on output.

It's a short list with highest probabilities.

Temperature controls which tokens to pick - usually 0% = top one only (consistent results), closer to 100% means more randomness (more "creativity").

2 comments

Since we’re here: Does a “resused” token count as a second token?

For example: if you limited all inout/output to the same 100 words, could you stay within the token limit permanently?

so a glorified Markov chain?
Yes, in the same sense a modern digital camera is a glorified photodiode. In both cases, light comes in, voltage comes out, and we can use it to count how much light came in.
Why stop there, it's just ones and zeroes.

It's "glorified markov chain" in the same sense that sqlite is just "glorified bubble sort".

Don't you know that "attention is all you need"? Attention is non-markovian. It's all-to-all with some masking, not a chain.