Y
Hacker News
new
|
ask
|
show
|
jobs
by
vnglst
777 days ago
Inspired by Andrej Karpathy's excellent YouTube video on tokenizers, I used Llama3 to analyze all of GPT-4's 100.000 tokens and today I wrote a blog post about it!
https://koenvangilst.nl/blog/analyzing-gpt-4-tokens
1 comments
Zambyte
777 days ago
> are single characters
of
tokens where the origin cannot be assessed.
Is this meant to be "or"? It probably wouldn't change the output very much though
link
vnglst
777 days ago
Good catch, glad that LLMs are usually pretty forgiving when it comes to spelling mistakes
link
Is this meant to be "or"? It probably wouldn't change the output very much though