| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gliptic 507 days ago
	R1 is trained for a context length of 128K. Where are you getting 8K/32K? The model doesn't distinguish "thinking" tokens and "output" tokens, so this must be some specific API limitations.

2 comments

throwup238 507 days ago

> max_tokens：The maximum length of the final response after the CoT output is completed, defaulting to 4K, with a maximum of 8K. Note that the CoT output can reach up to 32K tokens, and the parameter to control the CoT length (reasoning_effort) will be available soon. [1]

[1] https://api-docs.deepseek.com/guides/reasoning_model

link

gliptic 507 days ago

So yes, it's a limitation of their own API at the moment, not a model limitation.

link

throwup238 507 days ago

I’m using it through Kagi which doesn’t use Deepseek’s official API [1]. That limitation from the docs seems to be everywhere.

In practice I don’t think anyone can economically host the whole model plus the kv cache for the entire context size of 128k (and I’m skeptical of Deepseek’s claims now anyway).

Edit: a Kagi team member just said on Discord that they’ll be increasing max tokens next release

[1] https://help.kagi.com/kagi/ai/llms-privacy.html

link

coliveira 507 days ago

He's just repeating a lot of disinformation that has been released about deepseek in the last few days. People who took the time to test DeepSeek models know that the results have the same or better quality for coding tasks.

link

goosejuice 507 days ago

Benchmarks are great to have but individual/org experiences on specific codebases still matter tremendously.

If an org consistently finds one model performs worse on their corpus than another, they aren't going to keep using it because it ranks higher in some set of benchmarks.

link

hn_throwaway_99 507 days ago

But you should also be very wary of these kind of anecdotes, and this thread highlights exactly why. That commenter says in another comment (https://news.ycombinator.com/item?id=42866350) that the token limitation that he is complaining about has actually nothing to do with DeepSeek's model or their API, but is a consequence of an artificial limit that Kagi imposes. In other words, his conclusion about DeepSeek is completely unwarranted.

link

throwup238 507 days ago

It mashed the header and C++ file together, which is egregiously bad in the context of QT. This isn’t a new library, it’s been around for almost thirty years. Max token sizes have nothing to do with that.

I invite anyone to post a chat transcript showing a successful run of R1 against this prompt (and please tell me which API/service it came from so I can go use it too!)

link

goosejuice 506 days ago

I wasn't suggesting using the anecdotes of others to make a decision.

I'm talking about individuals and organizations making a decision on whether or not to use a model based on their own testing. That's what ultimately matters here.

link