Hacker News new | ask | show | jobs
by lt 2103 days ago
A character, both input and output.
1 comments

Not exactly, GPT-3 uses a variant of BPE [1], so one token can correspond to a character, an entire word or more, or anything in between. The paper [2] says a token corresponds to 0.7 words on average.

[1] https://en.wikipedia.org/wiki/Byte_pair_encoding

[2] https://arxiv.org/abs/2005.14165, page 24