|
|
|
|
|
by cmiles74
409 days ago
|
|
It reads to me like they compare the output of different prompts and somehow reach the conclusion that Claude is generating more than one token and "planning" ahead. They leave out how this works. My guess is that they have Claude generate a set of candidate outputs and the Claude chooses the "best" candidate and returns that. I agree this improves the usefulness of the output but I don't think this is a fundamentally different thing from "guessing the next token". UPDATE: I read the paper and I was being overly generous. It's still just guessing the next token as it always has. This "multi-hop reasoning" is really just another way of talking about the relationships between tokens. |
|