Hacker News new | ask | show | jobs
by gliptic 536 days ago
The model gives you a probability distribution over the tokens. You could use that directly with arithmetic coding, but there are ways to convert that to a distribution over e.g. the next byte instead which would improve efficiency further by removing the redundancy in alternative token encodings. ts_zip does this, and README says this works similar to ts_zip.

EDIT: Hm, or maybe ts_zip uses just the token probabilities directly. I thought it was slightly more efficient about it.

"The language model predicts the probabilities of the next token. An arithmetic coder then encodes the next token according to the probabilities."

1 comments

Oh, that makes sense! So they use the probability of the next token itself. Thanks for clarifying. Also clever trick about the multiple potential tokens to represent the same text.