Hacker News new | ask | show | jobs
by Roark66 1021 days ago
Beam search is well known. I mean strategies like beam search, but one's we don't know about.

I can imagine some, for example like beam search but you score every option with a smaller model. Of course one can say "but we see every token as it streams" to which I might say, are you sure? Perhaps they generate a hundred entire responses in the time it takes for one token to be shown. They just "stream" those tokens so slow to make it more "human pace" oriented.

1 comments

interesting. but there should be physical limits to that that we can handicap to put bounds on speculation. so for example, FLOPS/s has an upper bound and you can make latency estimates for 1/10/100B models. this would put reasonable bounds for statements like "a hundred entire responses in the time it takes for one token to be shown"