|
|
|
|
|
by wavemode
792 days ago
|
|
Yeah but this doesn't change how the model functions, this is just turning reasoning into training data by example. It's not learning how to reason - it's just learning how to pretend to reason, about a gradually wider and wider variety of topics. If any LLM appears to be reasoning, that is evidence not of the intelligence of the model, but rather the lack of creativity of the question. |
|
If you consider AlphaTensor or other products in the Alpha family, it shows that feedback can train a model to super-human levels.