|
|
|
|
|
by armcat
498 days ago
|
|
How I see LLMs (which have roots in early word embeddings like word2vec) is not as statistical machines, but geometric machines. When you train LLMs you are essentially moving concepts around in a very high dimensional space. If we take a concept such as “a barking dog” in English, in this learned geometric space we have the same thing in French, Chinese, hex and Morse code, simply because fundamental constituents of all of those languages are in the training data, and the model has managed to squeeze all their commonalities into same regions. The statistical part really comes from sampling this geometric space. |
|