| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ogrisel 504 days ago

I don't understand why it's bad for Nvidia either.

The fact that DeepSeek-R1 is so much better than DeepSeek-V3 at various important tasks means that Chain-of-though / thinking-before-answering models are better. But they are also more compute intensive at inference time than their instruction non-thinking counterparts.

So even if the DeepSeek-V3 pretraining + GRPO COT post-training procedure was cheaper than anticipated to reach o1 grade performance, inference is still costly, even if you use a distilled model.

1 comments

bildung 504 days ago

Deepseek offers API pricing directly on their website, so it's pretty easy to compare inference costs indirectly: It's $60.00 vs. $2.19 for 1M output tokens. Openai is 27x as expensive.

link