Hacker News new | ask | show | jobs
by jbellis 373 days ago
This is a useful analysis, but only as a first cut and sometimes not even that -- Grok 3 mini and DeepSeek V3 are by far the least expensive coding models that are worth trying for scenarios where you do and don't care about the vendor training on your requests, respectively. One of those is "open source" (by which he seems to mean "open weights") but far too large to run locally.

[I guess that must be a useful market niche though, apparently this is by a company selling batch compute on exactly those small open weights models.]

The problem is the author is evaluating by dividing the Artificial Analysis score by a blended cost per token, but most tasks have an intelligence "floor" below which it doesn't matter how cheap something is, it will never succeed. And when you strip out the very high results from super cheap 4B OSS models the rest are significantly outclassed by Flash 2.0 (not on his chart but still worth considering) and 2.5, not to mention other models that might be better in domain specific tasks like grok-3 mini for code.

(Nobody should be using Haiku in 2025. The OpenAI mini models are not as bad as Haiku in p/p and maybe there is a use case for prefering one over Flash but if so I don't know what it is.)

1 comments

DeepSeek has a lot of competing providers that at least state they don't train on API data, OpenRouter lists a bunch of them: https://openrouter.ai/deepseek/deepseek-chat-v3-0324/provide...

(This is a big advantage of open weight models; even if they're too big to host yourself, if it's worth anything there's a lot of competition for inference)

and all of them are so much more expensive than OG deepseek as to completely remove themselves from consideration

you should probably use grok 3 mini if you want "cheapest model that is reasonably good at code"

The above link disproves your "more expensive than OG deepdeek"
Only if you can't read. DSv3 is 4x cheaper than the cheapest option in that link during US business hours.