Hacker News new | ask | show | jobs
by throwup238 507 days ago
> max_tokens:The maximum length of the final response after the CoT output is completed, defaulting to 4K, with a maximum of 8K. Note that the CoT output can reach up to 32K tokens, and the parameter to control the CoT length (reasoning_effort) will be available soon. [1]

[1] https://api-docs.deepseek.com/guides/reasoning_model

1 comments

So yes, it's a limitation of their own API at the moment, not a model limitation.
I’m using it through Kagi which doesn’t use Deepseek’s official API [1]. That limitation from the docs seems to be everywhere.

In practice I don’t think anyone can economically host the whole model plus the kv cache for the entire context size of 128k (and I’m skeptical of Deepseek’s claims now anyway).

Edit: a Kagi team member just said on Discord that they’ll be increasing max tokens next release

[1] https://help.kagi.com/kagi/ai/llms-privacy.html