Hacker News new | ask | show | jobs
by earslap 782 days ago
For the existing models is beam-search like methods hopeless due to combinatorial explosion? Are there no smart ways to improve it? Evaluating multiple futures will be slow but if it means that the model can give vastly better output, it might be a worthwhile trade-off in some cases. I feel like our standard way of sampling the output of the LLMs is a bit too simplistic and my hunch is that it should be possible to get a lot more out of them even if it means losing speed.
1 comments

People are considering that sort of beam-search approach - this is what they call "tree of thoughts" - generate a branching tree of alternate continuations, then pick the best one based on some criteria.

This doesn't seem an ideal approach though, since it amounts to generating a bunch of shallow responses and picking the best, rather than the preferred thinking more deeply before generating. It's not the same as a computer chess program considering N-moves ahead where you are guaranteed that one of those move sequences really is the best one (as long as you don't accidentally prune it out). In contrast, if you generate all possible "shallow" N-token responses (bunch of monkeys gibbering), there is no guarantee any of those will be the high quality response you are hoping for.

Really planning ahead - reasoning deeply before speaking - would seem harder to implement though, since it'd involve applying a variable number of reasoning steps (maybe looping), then determining when to stop. This also seems different from the proposed insertion of "reasoning tokens" since those are shallow reasoning steps (normal single pass through transformer's layers), when it seems what is really needed is more depth of reasoning ("more layers"), perhaps coupled with some working memory/tokens. Both schemes (more tokens vs more depth) are also related to the wish to use a variable amount of compute for different tasks/inputs - less compute for simple tasks, more for hard ones.

Ah yes, I totally agree. I was inspecting the method as a stopgap solution (especially because it does not require retraining or any other special tricks) until researchers figure out "planning" in a broader sense. It is very inefficient otherwise, but in the meantime, is just simple sampling with a couple parameters to tune from the output softmax the best we can do? is there no low hanging fruit there?
I suppose the closest alternative to planning ahead (considering alternatives before taking any action - in this case generating tokens) is getting it right the first time, which is only really possible in cases of highly constrained circumstances (prompts) where the model saw enough similar examples to predict the same correct/preferred response. So, to that extent, I suppose better prediction - bigger model, more/better training, etc, reduces the need for planning a bit. Architectural changes, such as adding working memory, that boost predictive power, would also help.

But, yeah, hard to see too many alternatives.

1) Get it right first time (not always possible)

2) Don't plan, but at least consider a bunch of poor alternatives - tree of thoughts

3) Actually implement planning