|
|
|
|
|
by somewhereoutth
1045 days ago
|
|
I mean that it simply surfaces patterns in the training data. So responses will be an 'agregation' (obviously more complex than that) of similar prompt/response from the training corpus, with some randomness thrown in to make things more interesting. |
|
A useful rule of thumb, I think, is that if you're trying to describe what LLMs can do, and what you're saying is something that a Markov chain from 2003 could also do, you're missing something. In that vein, I think talking about building from a "similar prompt/response from the training corpus", though you allow "complex" aggregation, can be pretty misleading in terms of LLM capabilities. For example, you can ask a model to write code, run the code and give the model an error message, and then model will quite often be able to identify and correct its mistake (true for GPT-4 and Claude at least). Sure, maybe both the original broken solution and the fixed one were in the training corpus (or something similar enough was), but it's not randomness taking us from one to the other.