| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by goosejuice 507 days ago
	Benchmarks are great to have but individual/org experiences on specific codebases still matter tremendously. If an org consistently finds one model performs worse on their corpus than another, they aren't going to keep using it because it ranks higher in some set of benchmarks.

1 comments

hn_throwaway_99 507 days ago

But you should also be very wary of these kind of anecdotes, and this thread highlights exactly why. That commenter says in another comment (https://news.ycombinator.com/item?id=42866350) that the token limitation that he is complaining about has actually nothing to do with DeepSeek's model or their API, but is a consequence of an artificial limit that Kagi imposes. In other words, his conclusion about DeepSeek is completely unwarranted.

link

throwup238 507 days ago

It mashed the header and C++ file together, which is egregiously bad in the context of QT. This isn’t a new library, it’s been around for almost thirty years. Max token sizes have nothing to do with that.

I invite anyone to post a chat transcript showing a successful run of R1 against this prompt (and please tell me which API/service it came from so I can go use it too!)

link

goosejuice 506 days ago

I wasn't suggesting using the anecdotes of others to make a decision.

I'm talking about individuals and organizations making a decision on whether or not to use a model based on their own testing. That's what ultimately matters here.

link