| HN Mirror

o1 has moved "reasoning" from training time to partly something happening at inference time.

I'm thinking of this difference as analogus to the difference between my (as a human) first intution (or memory) about a problem to what I can achieve by carefully thinking about it for a while, where I can gradually build much more powerful arguments, verify if they work and reject parts that don't work.

If you're familiar with chess terminology, it's moving from a model that can just "know" what the best move is to one that combines that with the ability to "calculate" future moves for all of the most promising moves, and several moves deep.

Consider Magnus Carlsen. If all he did was just did the first move that came to his mind, he could still beat 99% of humanity at chess. But to play 2700+ rated GM's, he needs to combine it with "calculations".

Not only that, but the skill of doing such calculations must also be trained, not only by being able to calculate with speed and accuracy, but also by knowing what parts of the search tree will be useful to analyze.

o1 is certainly optimized for STEM problems, but not necessarily only for using strict rule-based logic. In fact, even most hard STEM problems need more than the ability to perform deductive logic to solve, just like chess does. It requires strategical thinking and intuition about what solution paths are likely to be fruitful. (Especially if you go beyond problems that can be solved by software such as WolframAlpha).

I think the main reason STEM problems was used for training is not so much that they're solved using strict rule-based solving strategies, but rather because a large number of such problems exist that have a single correct answer.