|
|
|
|
|
by im3w1l
1068 days ago
|
|
So crazy thought I had.. As far as I understand these models can only do a fixed amount of work per token of output. So asking for it to show its work has two benefits, it lets it reference previous results it worked out, but it also plain gives it more computational resources. So I'm curious what would happen if you prompted it to stall for time a bit with an answer like "hmm.... err... let's see.. what about 81?" |
|
Lemme check...
Prompt:
claude-instant: mpt-30b-chat: Other models gave correct answers as before.So yeah, the attention mechanism was ignoring the musing tokens. It needs more task-relevant tokens (doing the math) to improve the result.
Doing the math step by step fills the context with task-relevant tokens, thus increasing the probability that the attention mechanism will select them and pull the next token from the correct latent space.
The inference cycle treats the generation of each token separately, so if it puts "20+20=", it's easier to predict that it's 40, and after putting 40, the next iteration of the cycle, the attention mechanism sees "step by step", infers that the task isn't done yet, and generates "40+20=", etc.
In much larger models, the attention mechanism sees the question and presumably finds a solved answer to that question in the model's latent space, producing a memorized result.