| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sschueller 504 days ago

It doesn't matter now as deepseek has shown. Also this was done under Binden so we have no idea what Trump will do.

However it does spoil the relationship and maybe the next time around when Switzerland is ready to spend 6B Swiss Francs on planes it won't be US made ones.

Additionally if Trump doesn't follow the agreed upon global minimum tax which Switzerland also bent over backwards for why would it do any such agreement with the US in the future?

2 comments

raincole 504 days ago

> It doesn't matter now as deepseek has shown

People keep saying that DeepSeek R1's training cost is just $5.6M. Where is the source?

I'm not even asking for the proof. Just the source, even a self-claimed statement. I've read the R1's paper and it doesn't say the number of $5.6M. Is it somewhere in DeepSeek's press release?

Google just gives me a lot of medium articles and journalist sites. It sounds awfully like a number made up by some analyst and got parroting around. I've even seen people on X saying DeepSeek is "lying", while I can't even find what the exact DeepSeek's claim is.

link

logicchains 504 days ago

It's in the DeepSeek V3 paper, not the R1 paper. https://arxiv.org/html/2412.19437v1#abstract

"assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M."

Note that's for V3, the base model; we don't know how much extra R1 cost to train.

link

raincole 504 days ago

I see. Thank for the source.

So all the claims of DeepSeek R1's cost [0] is indeed bullshit parroted around...

[0]: https://www.google.com/search?q=deepseek+r1+training+cost

link

Philpax 504 days ago

Not really; R1 is post-training on top of V3, which is considerably cheaper than training V3 itself. You can see this in the existence of multiple reproductions of the RL training technique by much smaller labs: https://hkust-nlp.notion.site/simplerl-reason

link

genewitch 504 days ago

any source?

CNBC: https://noagendaassets.com/enc/1737931632.132_cnbctechceosso...

I don't wanna do all the work, folks.

link

karmasimida 504 days ago

Why? Serving is still a massive effort, requires massive amount of GPU memory to hold those models.

I don't understand the logic that deepseek somehow is a blow to GPU demand. If anything, more people will try to build on top of R1 style model now, it is only going to drive demand, for customized training.

link

sschueller 504 days ago

We can buy old chips at any volume. The restriction is only on the latest and greatest.

DeepSeek has shown that you can achieve the same or better result on old hardware with less computing power.

link

karmasimida 504 days ago

H800 is essentially H100, and it is not old. And GPUs do expire, it breaks down constantly. You need to swap them in and out.

Buying old chips isn't related to deepseek what so ever, you can buy A100 also.

link