Hacker News new | ask | show | jobs
by potatoman22 1042 days ago
Does looping over the vocabulary add much overhead to the tok/s? I imagine they're just checking if the input is in a set, and usually there's only ~30k tokens. That's somewhat intensive, but inference on the neural net feels like it'd take longer.
1 comments

They’re checking regex partial matches for each possible completion, which is intensive indeed. You can look at the Figure 2 in our paper (link in original post) for a simple comparison with MS guidance which shows the difference.