|
|
|
|
|
by CamperBob2
181 days ago
|
|
Interestingly, it doesn't always condition the final output. When playing with DeepSeek, for example, it's common to see the CoT arrive at a correct answer that the final answer doesn't reflect, and even vice versa, where a chain of faulty reasoning somehow yields the right final answer. It almost seems that the purpose of the CoT tokens in a transformer network is to act as a computational substrate of sorts. The exact choice of tokens may not be as important as it looks, but it's important that they are present. |
|