Hacker News new | ask | show | jobs
by 3wolf 607 days ago
> Branching predictions involves following a few logits to see what other tokens they lead to. This is often called MCTS (Monte Carlo Tree Search) and is a method that has been often tried in LLMs to middling success. One of the tradeoffs of branching is that it requires using inference compute in a way where the branches cannot benefit from each others compute.

I wonder if speculative decoding could help here? E.g. have some small model draft predictions for the branches and parallel and have to big model verify the most promising one.