| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 3wolf 607 days ago
	> Branching predictions involves following a few logits to see what other tokens they lead to. This is often called MCTS (Monte Carlo Tree Search) and is a method that has been often tried in LLMs to middling success. One of the tradeoffs of branching is that it requires using inference compute in a way where the branches cannot benefit from each others compute. I wonder if speculative decoding could help here? E.g. have some small model draft predictions for the branches and parallel and have to big model verify the most promising one.