Hacker News new | ask | show | jobs
by user_7832 807 days ago
Thanks, that's exactly what I was looking for! Any idea if it's possible to use beam search on local models like mistral? It sounds like the choice of beam search vs say top-p or top-k should be in the software and not embedded, right?
2 comments

If you use HuggingFace models, then a few simpler decoding algorithms are already implemented for `generate` method of all supported models.

Here is a blog post that describes it: https://huggingface.co/blog/how-to-generate.

I will warn you though that beam search is typically what you do NOT want. Beam search approximately optimizes for the "highest likely sequence at the token level." This is rarely what you need in practice with open-ended generations (e.g. a question-answering chat bot). In practice, you need "highest likely semantic sequence," which is much harder problem.

Of course, various approximations for semantic alignment are currently in the literature, but still a wide open problem.

This is actually a great question for which I found an interesting attempt: https://andys.page/posts/llm_sampling_strategies/

(No affiliation)