Y
Hacker News
new
|
ask
|
show
|
jobs
by
dietr1ch
889 days ago
I'd guess that the tokenizer is just different and handles this in a "better" way.
1 comments
goodside
889 days ago
No, in both tokenizers Unicode tag-block code points like these are converted into bytes (two tokens per character), which is a fallback for code points uncommon enough to not warrant a dedicated token.
link