Hacker News new | ask | show | jobs
by notatallshaw 357 days ago
It looks like TikToken is written in Rust (https://github.com/openai/tiktoken/tree/main/src), are the gains here actually from porting to C++?
1 comments

From the post

Profiling TikToken’s Python/Rust implementation showed a lot of time was spent doing regex matching. Most of my perf gains come from a) using a faster jit-compiled regex engine; and b) simplifying the algorithm to forego regex matching special tokens at all.