| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ilaksh 3 days ago
	I'm kind of poor so I have been trying to use DeepSeek v4 Flash, GLM 5.1 etc. as much as possible recently instead of Claude or GPT.

1 comments

petesergeant 3 days ago

You would do us all a service by telling us how your experiences of that have been.

link

RussianCow 3 days ago

I've been doing the same, though admittedly out of curiosity more so than lack of funds. The open models are catching up quickly in their abilities, to the point where they're (mostly) not doing stupid stuff regularly, but you have to be very specific about what you want. I found that Opus, for example, is much better at asking me to clear up ambiguity in a request before starting, whereas the Chinese models tend to "fill in the blanks" and make their own assumptions.

My current workflow involves going from PRD -> execution plan -> build -> review, and this works nicely with open weight models like GLM 5.1, Kimi K2.6, and DeepSeek V4 Flash. With Opus I can generally skip the PRD entirely, and sometimes even skip the plan, and 80-90% of the time it does exactly what I want. But that can easily burn $5-15 for one feature, whereas it'll cost maybe $1-2 with the open weight models (at API pricing).

link

andai 3 days ago

> ... you have to be very specific about what you want. I found that Opus, for example, is much better at asking me to clear up ambiguity in a request before starting, whereas the Chinese models tend to "fill in the blanks" and make their own assumptions.

That's the main thing I've noticed. Small models can follow instructions just fine. If the instructions are very specific. Then I often have to spend more time explaining a task than it would have taken me to do it myself.

The bigger models have a lot more common sense.

I wonder if that could be improved slightly through prompting. Asking it to clarify anything that's confusing. Or maybe it just makes incorrect assumptions without realizing the ambiguity. One way to find out!

link

nchmy 1 day ago

This is my observation as well with deepseek by flags. It takes too much initiative, and is often not particularly smart. Yet, I find it is so fast and good at iterating/correcting it's mistakes that it eventually finds the way on its own.

Though, I tend to use it as a pair programmer so just stop it and provide guidance.

The real problem is that it is excessively verbose - it's impossible to keep up with it's train of thought, and not practical to read it all. So I tend it just let it do it's thing then skim a bit and skip to the end for it's summary.

Try opencode go subscription - you get the Chinese models for 6x discount. I use like $1 a day...

link

ilaksh 3 days ago

I would say about 35% of the time I run into problems and eventually give up and go to GPT 5.5 and it much more efficiently handles the original task. Then I see the token costs going up and it motivates me to continue trying the open source ones.

link

andai 3 days ago

Did you try deepseek v4 pro as well? And what kind of tasks?

I'm seeing some people say flash is amazing and can handle everything, and some say it's useless. It seems to depend on the task. I think it depends on the harness too (it works better in Claude Code in my experience, it's probably been trained on that).

link

ilaksh 2 days ago

the problem for me with deepseek v4 pro is like a significant amount of time it just seems to like never finish what it is doing.. loonnng thinking and then a lot of time to output or just seems to never finish. that has happened several times to me. could be my agent framework partly. .but I have heard other people complain about that also.

it has limitations but it is way better than I expect from something named Flash that is open source.

link

Schlagbohrer 2 days ago

There's going to be a tipping point where it's worth purchasing more hardware to run the next biggest size of the open model, if they show stepwise improvements that way.

link

polski-g 3 days ago

I used Opus 4.6, then downgraded to Sonnet, then to GLM5/5.1. GLM is as good as Sonnet. I recently started using Opus 4.8 again and GLM is not close to that.

30 day eval for each.

link

csomar 2 days ago

The only one that is really close to Claude in performance is GLM-5.1. The others (Mimo, deepseek, etc..) looks good on paper but usually fails on a multi-step agentic orchestration.

This is at least my experience with Claude Code as harness. Also, GLM pricing is not that far off from Claude. It's cheaper but not DeepSeek cheap.

link

nchmy 1 day ago

Deepseek v4 flash is amazing

link