|
|
|
|
|
by stared
491 days ago
|
|
While I like the idea of measuring subsequent steps, this kind of approach of using embeddings is the reason why I wrote: "Don't use cosine distance carelessly" (https://p.migdal.pl/blog/2025/01/dont-use-cosine-similarity). In this case, cosine distance one would be in a case when it repeats word-by-word. It is not even a "similar thought" but some sort of LLM's OCD. For anything else... cosine similarity says little. Sometimes, two steps can have opposite conclusions but have very high cosine similarity. In another case, it can just expand on the same solution but use different vocabulary or look from another angle. A more robust approach would be to give the whole reasoning to an LLM and ask to grade according to a given criterion (e.g. "grade insight in each step, from 1 to 5"). |
|
We actually use a variant of this approach in our reasoning prompts. We use structured output to force the LLM to think for 15 steps, and in each step we force it to generate a self-assessed score and then make a decision as to whether it wants to CONTINUE, ADJUST, or BACKTRACK.
I go into a bit more depth about it here, with an explicit example of its thinking at the end: https://bits.logic.inc/p/the-eagles-will-win-super-bowl-lix