| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sibidharan 19 days ago
	Which models are we talking about? Is there any degradation in quality, long context retrieval?

2 comments

throwa356262 19 days ago

The tweet mentioned deepseek V4 flash.

From HF: 284B parameters (13B active), 1M context window.

This is indeed some kind of compressed context and the quality goes down as the context grows. IIRC the V4 paper had some numbers on this

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

link

sibidharan 18 days ago

I heard claude models are of trillions of parameters !!! 284B, 1M... I wouldn't trust on long running autonomous agents! But for the API costs, this is justified if a better hardware with bigger model is used and will be a claude equivalent for comparable quality at long context retrieval on long running autonomous tasks. At least for me that is important.

link

wilbur_whateley 19 days ago

V4 flash is much worse than any Claude model. If you're doing something simple, it can be a good way to save money though.

link

throwa356262 19 days ago

I agree that Claude is better (definitely better than the flash version which is relatively small). But...

I actually canceled my Claude Code plan a few months back after trying out some of the "lesser" models on openrouter. They seem to work as just as well (or just as bad) for my coding tasks.

link

jst1fthsdys 18 days ago

Define "much worse". I use DS v4, GLM, and some Kimi with omp personally, and have Cursor with latest Claude and GPT models at work. I notice zero difference in the work for my workflow between Opus and DS.

Really confused how people make these claims. Are you just basing this off benchmarks or your own personal work? Are you an experienced dev or just doing vibe coding?

link

wilbur_whateley 18 days ago

My own experience. I'm working on something complex that's not in the datasets these models were trained on. There I see V4 flash breaking down and hallucinating much more often than GPT/Claude. For normal, common tasks, I also don't see much of a difference.

link

rjh29 18 days ago

Huge variation in how people prompt and use their models. Vibe coding with ambiguous requirements vs. multiple steps of precise planning are completely different imo

link

ninju 19 days ago

It depends on how mature the DeepSeek model became before OpenAI noticed that they were wholesale replicating their model and starting blocking access

https://www.reuters.com/world/china/openai-accuses-deepseek-...

link