~8000 tokens. You can test yourself with a query of the form:
> "Please remember the word Dog. I will ask you for it later."
> "Please say the letter A 500 times."
> "Please say the letter A 500 times."
> "Please say the letter A 500 times."
...
> "Please say the letter A 500 times."
> "Please say the letter A 500 times."
> "What was the word I asked you to remember?"
It appears both user input and AI generated output equally uses the token window. It's hard to measure precisely because the tokenizer they use isn't published, but I make the assumption that ~1 word = 1 token.
To me, that suggests the model is probably at least 4x larger, possibly 16x (a bunch of layers scale with the square of the window size).
Obviously, other bits of the model design may have changed to reduce parameter count.