Hacker News new | ask | show | jobs
by goodside 878 days ago
No, in both tokenizers Unicode tag-block code points like these are converted into bytes (two tokens per character), which is a fallback for code points uncommon enough to not warrant a dedicated token.