|
|
|
|
|
by eximius
1121 days ago
|
|
Eh. I still consider them stochastic parrots. My concessions lie elsewhere, primarily in the vocabulary. We refer to algorithms like quicksort as 'reasoning' about the input. So it's fine to use the same sense of the word to apply to stochastic parrots. The difference between an LLM learning how to sort things and compiling an implementation of an algorithm like quicksort is not terribly large, from a certain perspective. I suppose something I'm interested in is whether an LLM that can't sort numbers could be instructed how as a prompt and then do so. There are some examples of similar phenomenon (the one with some kids made up language was interesting) which suggests the LLMs have a lot of space dedicated towards dynamic pattern selection in their context windows (somewhat tautological) in order to have prompts tune the selection for other layers. And, of course, lack of plasticity is really interesting. |
|
When that is combined with the fact that transformers provably can implement proper deterministic sorting algorithms, it seems that the benefit of the doubt should go to the transformer having learned a sorting algorithm?
LLMs aren't plastic in the sense that they don't learn anything when they aren't being trained. But they can be trained to execute different programs depending on the contents of the context window, like if it contains "wrong, try again:" so maybe they can learn from their mistakes in that sense.
But if you could teach an LLM to sort by explaining it in the context window, the network would already have necessarily learned and stored a sorting algorithm somewhere; the text "here is how sorting is done: [...]" would just be serving as the trigger for that function call.