Hacker News new | ask | show | jobs
by vnglst 777 days ago
Inspired by Andrej Karpathy's excellent YouTube video on tokenizers, I used Llama3 to analyze all of GPT-4's 100.000 tokens and today I wrote a blog post about it! https://koenvangilst.nl/blog/analyzing-gpt-4-tokens
1 comments

> are single characters of tokens where the origin cannot be assessed.

Is this meant to be "or"? It probably wouldn't change the output very much though

Good catch, glad that LLMs are usually pretty forgiving when it comes to spelling mistakes