Hacker News new | ask | show | jobs
by danielmarkbruce 855 days ago
Of course they do. Beam search is a thing. The reason it's not used as much as it might seem to make sense - cost. Do a greedy search and you run through the model x times where x is the number of tokens generated. Run top-k at every step, the number of runs through the model gets astronomical quickly.