Hacker News new | ask | show | jobs
by blatant303 1170 days ago
> Unicode characters like emojis may be split into many tokens containing the underlying bytes: ������ [<- this is a single emoji]

Source: https://platform.openai.com/tokenizer