|
|
|
|
|
by 3wolf
607 days ago
|
|
> Branching predictions involves following a few logits to see what other tokens they lead to. This is often called MCTS (Monte Carlo Tree Search) and is a method that has been often tried in LLMs to middling success. One of the tradeoffs of branching is that it requires using inference compute in a way where the branches cannot benefit from each others compute. I wonder if speculative decoding could help here? E.g. have some small model draft predictions for the branches and parallel and have to big model verify the most promising one. |
|