|
|
|
|
|
by isaacfung
662 days ago
|
|
The text is converted to embeddings after tokenization. The neural networwk only sees vectors. Imagine the original question is posed in English but it is translated to Chinese and then the LLM has to answer the original question based on the Chinese translation. It's a flaw of the tokenization we choose. We can train an LLM using letters instead of tokens as the base units but that would be inefficient. |
|