Hacker News new | ask | show | jobs
by kgc 604 days ago
Is there a reference for this? I was wondering the same thing.
1 comments

Read the original whitepaper or go look at how any framework implements it.

You will see that tokens not predicted by greedy sampling of the target model are rejected. Ergo, they are mathematically identical.