Hacker News new | ask | show | jobs
by viraptor 526 days ago
> Can you construct an efficient algorithm for sampling from q(tk∣t1,…,tk−1), that minimizes calls to the original language model?

I feel like I'm missing some issue here... Can't you query stopping at the last full token boundary, then reject any results which don't match the character prefix and continue from there with the completion? Kind of like when you mask the invalid actions when reinforcement training on games? Or is that losing too much info?

1 comments

I asked o1 to figure this out and this is essentially what it came up with as well.

https://chat.sshh.io/share/HIzUotMYVxFhRde94ZYJJ