Hacker News new | ask | show | jobs
by koakuma-chan 317 days ago
I really doubt you can fit all Harry Potter books in 1M tokens.
2 comments

The series is 1,084,170 words. At let's say 1.4 tokens per word, this would not fit, but it is getting close.
How do they do if you test[1] them for attention deficit disorder?

[1]: https://www.imdb.com/title/tt0766092/quotes/?item=qt1440870

It's 2M tokens for Gemini.
That was previous iterations, 2.5 is 1 million context window

https://ai.google.dev/gemini-api/docs/models (context window is details under model variant section with + signs)

They were meant to crank 2.5 to 2 million at some point though, maybe waiting now till 3?

Maybe consuming the resources internally.
I mean the Harry Potter books are 2M tokens.
The entire HP series is about one million words.
Harry Potter and the Order of Phoenix alone is 400K tokens.
Curious, I found an epub, converted it to a txt, and dumped it into the Qwen3 tokenizer. It yielded 359,088 tokens, end to end.

Using the GPT-4 tokenizer (cl100k_base) yields 349,371 tokens.

Recent Google and Anthropic models do not have local tokenizers and ridiculously make you call their APIs to do it, so no idea about those.

Just thought that was interesting.

And takes up a proportional width of everyone's bookshelves along side the others.