Hacker News new | ask | show | jobs
by bhickey 893 days ago
Why they aren't computing the next token marginal and sampling that? All I'm coming up with is that it's a reasonable way to work around dealing with different tokenizers.