| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TekMol 1090 days ago

Great, thanks for the clarification.

And how does the NN represent the token at the output layer? Is it a binary representation of the token number?

Or does it have a neuron for each token it knows and ChatGPT takes the most activated neuron as the answer?

2 comments

mirekrusin 1090 days ago

Tokens are integers that map to text tokens.

Tokens are part of words, approx 4 characters or 75% of word.

It gives a list of tokens with their probabilities on output.

It's a short list with highest probabilities.

Temperature controls which tokens to pick - usually 0% = top one only (consistent results), closer to 100% means more randomness (more "creativity").

link

joshspankit 1090 days ago

Since we’re here: Does a “resused” token count as a second token?

For example: if you limited all inout/output to the same 100 words, could you stay within the token limit permanently?

link

vorticalbox 1090 days ago

so a glorified Markov chain?

link

TeMPOraL 1090 days ago

Yes, in the same sense a modern digital camera is a glorified photodiode. In both cases, light comes in, voltage comes out, and we can use it to count how much light came in.

link

mirekrusin 1090 days ago

Why stop there, it's just ones and zeroes.

It's "glorified markov chain" in the same sense that sqlite is just "glorified bubble sort".

link

visarga 1090 days ago

Don't you know that "attention is all you need"? Attention is non-markovian. It's all-to-all with some masking, not a chain.

link

zwaps 1090 days ago

Basically the latter

link

weinzierl 1090 days ago

The tokenization algorithms I encountered all had around 50000 tokens, which fits nicely into (and makes good use of) a 16-bit number. Is this just a coincidence or does it have advantages for the token to be a 16-bit representable number?

link

danuker 1090 days ago

I suspect it being 16 bit instead of 32 bit means more of them can get packed more tightly. Some instructions can operate on them in parallel.

But I personally think it's a coincidence, and it just so happens that 50k tokens are enough for the level of complexity the models have right now.

link

sebzim4500 1090 days ago

Probably a coincidence. The GPT-4 and GPT-3.5 tokenizer has 100k tokens.

link