| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by amelius 323 days ago

No, an LLM really uses __much__ more bits per token.

First, the embedding typically uses thousands of dimensions.

Then, the value along each dimension is represented with a floating point number which will take 16 bits (can be smaller though with higher quantization).

1 comments

blutfink 323 days ago

Of course an LLM uses more space internally for a token. But so do humans.

My point was that you compared how the LLM represents a token internally versus how “English” transmits a word. That’s a category error.

link

amelius 323 days ago

But humans we can feed ascii, whereas LLMs require token inputs. My original question was about that: why can't we just feed the LLMs ascii, and let it figure out how it wants to encode that internally, __implicitly__? I.e., we just design a network and feed it ascii, as opposed to figuring out an encoding in a separate step and feeding it tokens in that encoding.

link

cesarb 323 days ago

> But humans we can feed ascii, whereas LLMs require token inputs.

To be pedantic, we can't feed humans ASCII directly, we have to convert it to images or sounds first.

> My original question was about that: why can't we just feed the LLMs ascii, and let it figure out how it wants to encode that internally, __implicitly__? I.e., we just design a network and feed it ascii, as opposed to figuring out an encoding in a separate step and feeding it tokens in that encoding.

That could be done, by having only 256 tokens, one for each possible byte, plus perhaps a few special-use tokens like "end of sequence". But it would be much less efficient.

link

amelius 323 days ago

Why would it be less efficient, if the LLM would convert it to an embedding internally?

link

cesarb 322 days ago

Because each byte would be an embedding, instead of several bytes (a full word or part of a word) being a single embedding. The amount of time a LLM takes is proportional to the number of embeddings (or tokens, since each token is represented by an embedding) in the input, and the amount of memory used by the internal state of the LLM is also proportional to the number of embeddings in the context window (how far it looks back in the input).

link