|
|
|
|
|
by int_19h
8 days ago
|
|
I don't see how any of this follow. Yes, the LLMs will learn the "meaning" (here narrowly defined as relative configuration in the embedding space) of vectors that correspond to tokens in whatever tokenizer is used to feed into them. But that vector space is not discrete, and nothing precludes the model from internally operating on other vectors that it never saw in training, based on how they relate to those vectors which it did see. |
|