| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amelius 322 days ago
	But humans we can feed ascii, whereas LLMs require token inputs. My original question was about that: why can't we just feed the LLMs ascii, and let it figure out how it wants to encode that internally, __implicitly__? I.e., we just design a network and feed it ascii, as opposed to figuring out an encoding in a separate step and feeding it tokens in that encoding.

1 comments

cesarb 322 days ago

> But humans we can feed ascii, whereas LLMs require token inputs.

To be pedantic, we can't feed humans ASCII directly, we have to convert it to images or sounds first.

> My original question was about that: why can't we just feed the LLMs ascii, and let it figure out how it wants to encode that internally, __implicitly__? I.e., we just design a network and feed it ascii, as opposed to figuring out an encoding in a separate step and feeding it tokens in that encoding.

That could be done, by having only 256 tokens, one for each possible byte, plus perhaps a few special-use tokens like "end of sequence". But it would be much less efficient.

link

amelius 322 days ago

Why would it be less efficient, if the LLM would convert it to an embedding internally?

link

cesarb 322 days ago

Because each byte would be an embedding, instead of several bytes (a full word or part of a word) being a single embedding. The amount of time a LLM takes is proportional to the number of embeddings (or tokens, since each token is represented by an embedding) in the input, and the amount of memory used by the internal state of the LLM is also proportional to the number of embeddings in the context window (how far it looks back in the input).

link