|
|
|
|
|
by simonh
1217 days ago
|
|
The way an LLM decides which word to use next is by evaluating the weightings of all the preceding words with every candidate word to calculate a probability for each of them. So if it selects ‘an’ as the next word, it’s because the weighting connecting ‘an’ to all the preceding words, and their orders in the text and relationships with each other predicted it should have a high probability of occurring. So you can’t extract the weightings for ‘an’ discretely because those weightings encode its connection with all the other words and combinations and sequences or clusters of words it might ever be used with, including their weightings with other preceding words, and their relationships, etc, etc. |
|
Come to think of it, when someone teaches me a new concept, the principle of mass conservation, for instance, in some sense they are transferring their embedding into my brain, further on I will relate to mass conservation through what that person taught me. The transfer is a very lossy process, sure, but a transfer with reintegration nonetheless. Perhaps "mortal computation" [4] is a requirement.
[1] https://en.wikipedia.org/wiki/Grandmother_cell
[2] https://www.youtube.com/playlist?list=PL8FnQMH2k7jzPrxqdYufo...
[3] https://www.youtube.com/watch?v=kTcRRaXV-fg
[4] Geoffrey Hinton, The Forward-Forward Algorithm: Some Preliminary Investigations, chapter 8, https://www.cs.toronto.edu/~hinton/FFA13.pdf