Y
Hacker News
new
|
ask
|
show
|
jobs
by
fooofw
196 days ago
The tokenization can represent uncommon words with multiple tokens. Inputting your example on
https://platform.openai.com/tokenizer
(GPT-4o) gives me (tokens separated by "|"):
lower|case|un|se|parated|name