Hacker News new | ask | show | jobs
by isaacfung 662 days ago
The text is converted to embeddings after tokenization. The neural networwk only sees vectors.

Imagine the original question is posed in English but it is translated to Chinese and then the LLM has to answer the original question based on the Chinese translation.

It's a flaw of the tokenization we choose. We can train an LLM using letters instead of tokens as the base units but that would be inefficient.

1 comments

By that definition the LLM literally does not see anything. LLMs predict tokens. That's it.
The LLM sees tokens, and predicts next tokens. These tokens encode a vast world, as experienced by humans and communicated through written language. The LLM is seeing the world, but through a peephole. This is pretty neat.

The peephole will expand soon, as multimodal models come into their own, and as the models start getting mixed with robotics, allowing them to go and interact with the world more directly, instead of through the medium of human-written text.

It sees embeddings that is trained to encode semantic meanings.

The way we tokenize is just a design choice. Character level models(e.g. karpathy's nanoGPT) exist and are used for educational purpose. You can train it to count number of 'r' in a word.

https://x.com/karpathy/status/1816637781659254908?lang=en