| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kamranjon 1 hour ago
	DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.

9 comments

tomalaci 1 hour ago

Probably because American AI companies are on the hook for quite a lot of investment money. I think they are trying to find the magical moat to justify their valuation.

Revealing optimizations similar to these would pretty much reduce their competitive position.

link

lwansbrough 1 hour ago

Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.

I suspect their tune will change if they ever take the lead..

link

skeledrew just now

Regardless of where they are, the Chinese will always share their progress, as they're collectivist/cooperative at their core, compared to the individualistic/competitive US.

link

oefrha 1 hour ago

Which is a good thing. Self-serving motives are more reliable than altruistic ones.

link

intended 40 minutes ago

The world runs on incentives. Altruism/Self-serving are down stream of that.

Wikipedia is altruistic, and serves humanity quite well.

link

theturtletalks 34 minutes ago

Open-source is also altruistic. If DeepSeek does become self-serving once they get the top spot, it doesn’t take away from the altruistic contributions that they made towards open models.

link

brookst 1 minute ago

And ultimately the motivation for those contributions just doesn’t matter, except to those who like to anthropomorphize company and argue about their souls.

link

nubg 1 hour ago

Very interesting take

link

broodbucket 1 hour ago

Look at how far OpenAI has drifted from their original mission. Everything comes back to greed, so it's ideal for the world if selfish motives happen to coincide with what's good for the world, like advancements in open models

link

roenxi 1 hour ago

It's a standard take since it is how markets tend to work. They aren't powered by altruism, it is a big system for turning greed into good results. We don't have all this stuff because people suddenly woke up one morning and decided to be nice.

link

breezybottom 3 minutes ago

Yes but there's more to the world than markets.

link

AlecSchueler 48 minutes ago

Isn't it the entire basis of capitalism?

link

lelanthran 48 minutes ago

I don't understand what is interesting about it: it's the default.

Markets don't run on altruism.

link

woctordho 4 minutes ago

And humans don't run on markets.

link

FooBarWidget 38 minutes ago

The standard is applied very inconsistently. Nobody accuses the local bakery of being motivated by profit, and that they don't bake bread for you out of altruism.

link

amelius 1 hour ago

You mean more predictable, not more reliable.

link

rrvsh 1 hour ago

Could you explain? (asking in good faith)

link

IshKebab 40 minutes ago

I don't think so. I can confidently predict that altruism will give you a very unreliable income stream in the vast majority of cases.

link

tw1984 1 hour ago

> Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.

US labs in Google, Meta and SpaceX are not leading, none of them managed to build something on par with GLM 5.2.

Care to explain to me why they still don't collaborate and still choose to do it in private?

link

vidarh 1 hour ago

I'm not sure I'd put Google in that list, but either way: Because they think they have enough capital that they can catch up and don't need the reputational boost of this.

link

CuriouslyC 1 hour ago

As good as Gemini's visual intelligence is, it's a terrible agent.

link

7speter 1 hour ago

Google at least still releases open source models to the public.

link

budsniffer952 1 hour ago

Wait, are you claiming that these companies haven't contributed to the ecosystem via research and open source?

link

lwansbrough 1 hour ago

No idea I don’t work there.

link

jmyeet 54 minutes ago

Projection is a funny thing. It causes people to misread situations all the time. Southern slaveowners feared violent retribution from freed slaves, for example [1]. It was pure projection and said more about the South than it did the slaves. The reality was there was no violent retribution. It was the opposite where the former slaveowners continued to inflict violence on the formerly enslaved.

I say this because we see the same thing used as an argument against China. "If they overtake us, they'll do imperialism (like us)." Again, it says more about us than them.

A better reading (IMHO) Of the situation is that China believes that AI shouldn't be used simply to mint a few more trillionaires but the benefits should be shared with society. Why do I say this? Because we now have 70+ years of China doing exactly that. The transformation in China all the way from rural villages to Tier 1 cities has been utterly astounding. China has lifted ~800M people out of extreme poverty.

In some ways we're at a similar point to the late 1990s and 2000s when Microsoft execs complained that Linux, being free, destroyed intellectual property value. Linux should be a perfect example of how people can and do act altruistically, or at least not in a way to bait-and-switch to enrich themselves.

[1]: https://www.reddit.com/r/AskHistory/comments/1d26grm/in_the_...

link

FooBarWidget 32 minutes ago

It's even worse than that. China publishes stacks upon stacks of policy documents in which they explain clearly what they will do and why. This includes why they do poverty alleviation and why they believe big monopolies that own everything are bad. But almost no western observers care to read those documents. Instead, western observers, including HN, speculate endlessly about China's intentions, and "it would be naive to believe they would not do X" or drawing equivalences to Soviet Union or whatever. And the "journalists" sell this notion that Chinese state intentions are "untransparent" and "unknowable" while pretending the policy documents don't exist.

Meanwhile, Xi Jinping has published his 5th book on how governance in China works and what they're after. These are not books written for a western audience: they're compilations of speeches that he already gave to the Chinese party and state apparatus, so the contents are not sanitized for foreign audiences. But there are no English reviews of summaries of this 5th book at all by the usual China experts that distribute what western audience know about China.

This extends to beyond the government. Even though "for the people but only against the government" is an often-heard mantra, nobody seems to listen to what Chinese AI companies themselves say about why they publish open models. DeepSeek and GLM have said multiple times publicly what their motivations are, yet people on HN still speculate like they usually do.

Truly mind-boggling. I get that a lot of people don't like China. But setting aside the question of whether their dislike is justified, it would at least be rational to properly understand China, even if it's to defeat it. And listening to what China says themselves is absolutely essential for proper understanding. But people don't bother to? And they seem mostly happy with sticking to speculations that match preconceived notions, even if that hurts their chances of defeating China.

link

jmyeet 14 minutes ago

I 100% agree with you and want to add something.

If you simply take what the Chinese government says at face value, you will be correct way more often than 95% of Western policy wonks, media talking heads, "analysts" and so forth. Because, like you say, they tell you everything they're doing.

In the recent US-China summit, Xi Jinping just came out and used the Thucydides Trap metaphor, which tells you everything about where China thinks it is and where it sees the US going, which is to become increasingly belligerent as their power declines. Now whether or not you agree with that assessment (I do agree), it still tells you China wants to avoid open hostilities, it sees itself as continuing to rise and it fears what a declining US might do.

link

FooBarWidget 7 minutes ago

The Thucydides Trap mention is different from what you describe. Xi has dismissed the Thucydides Trap multiple times in the past as being hearsay and self-imposed bias (https://www.globaltimes.cn/content/944179.shtml). "We should strictly base our judgment on facts, lest we become victims to hearsay, paranoid or self-imposed bias. There is no such thing as the so-called Thucydides trap in the world. But should major countries time and again make the mistakes of strategic miscalculation, they might create such traps for themselves."

But western politicians keep raising this metaphor. So at some point they're like "okay we'll speak your language". They then used this metaphor to make the point "our rise isn't the threat, your fear of it is. If you resist it you're walking right into the trap Thucydides warned about". So your conclusion is still right, they don't want open hostilities, a stable world is in their interest.

Then western media ran away with this and were like "OMG Xi mentioned the Thucydides Trap", completely ignoring his point.

link

colordrops 1 hour ago

So the marketplace is working.

link

abc123abc123 1 hour ago

This is the way! Open source models will benefit, and once open source models reach the state of "good enough" the hyped up US AI companies will fear, since the availability of free, good enough, AI models will set the ceiling for how much they can charge. Then the bubble will pop.

link

baxtr 32 minutes ago

Who is financing DeepSeek and what are they expecting in return?

link

archerx 31 minutes ago

They are self financed, the company that makes DeepSeek is a finance company that trades on the markets.

link

rsanek 19 minutes ago

The CCP's approach has historically been to subsidize their companies far more than other countries do. Why would LLMs be any different?

https://www.oecd.org/en/data/dashboards/magic-database-indus...

link

baxtr 5 minutes ago

Even if they were fully self-financed, which isn’t the case, they would expect something in return.

link

cromka 1 hour ago

I seriously am far from fear mongering and doomsday mentality, but I just can't see how OpenAI and Anthropic can have a successful IPO if the quality gap between the free and paid continues to narrow like that...

link

cyanydeez 1 hour ago

fascism. it works be corporate fascism.

link

2838383838 1 hour ago

this place might as well be fucking reddit nowadays

link

cyanydeez 40 minutes ago

you're right, full of corporate sock puppets shilling their vapor wares, idly dreaming that the world isn't what it is.

link

budsniffer952 1 hour ago

Do you think that DeepSeek are building their models for free, or something? They aren't "on the hook" for anything?

What's with all the China glazing about this stuff? They release some open-source work and people act like they are suddenly the beacon of freedom and transparency.

link

abc123abc123 1 hour ago

This is incorrect binary thinking. Them releasing open source can be good, but that does not commit you to think that china or chinese companies are saints. There are many shades of grey here and one does not exclude the other (nor include it).

link

budsniffer952 52 minutes ago

Are you reading the comments?

link

7speter 57 minutes ago

I’m think its in our best interests to lever these american ai companies to exhibit at least some degree of freedom and transparency anyway we can…

link

darkoob12 3 minutes ago

Google and Microsoft publish more than enough and American universities are publishing the science beyond DeepSeek's engineering. That fact that you don't know about them means you're not following the science only reading hacker news.

link

utopiah 1 minute ago

It's almost as if ... they were what OpenAI was when it started. Sad to see but glad someone is doing is.

link

herodoturtle 1 hour ago

Publishing by necessity I wonder? American labs on the cutting edge pioneering the way forward, so Deepseek open sourcing what they’ve got is to help even the playing field.

Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.

link

try-working 28 minutes ago

Yes, challenger Labs publish out of necessity. It is a marketing strategy. People assuming open source means giving something up, but the reality is that Z.ai has a revenue of some $100M and it would be about $0M if they never open sourced their models.

link

skeledrew 7 minutes ago

> Publishing by necessity

It's more a cultural thing. Sharing progress is just in their blood.

link

jonplackett 1 hour ago

Wouldn’t that just help the American labs anyway though? Or do they assume they’ve actually already figured this stuff out and kept it secret?

link

vintermann 30 minutes ago

It used to be the case that NSA hired the majority of all math graduates in the US, and were assumed to be years ahead in cryptography. Yet in the 90s, it became clear that they no longer were that - among other things, the cipher of the notorious Clipper chip was broken, and we can rule out that it was made weak on purpose because the whole point of Clipper was that they had a backdoor.

So, despite hiring the cream of the crop of math graduates, who could read the papers of free academia, but whose own result the free world could not access - they fell behind.

I have a theory explaining why. I think it's because science is an interactive process. NSA cryptographers could read papers, but they couldn't talk openly with the authors of those papers, because of secrecy demands - even asking question might indicate what they were working on. You can easily imagine them spending months on something they could have avoided by going to the original authors and getting told "Oh, we tried that for a long time, it doesn't work".

Whether that theory is right or not, cryptography is a concrete example of a domain where public research with fewer resources beat private research with a lot more resources.

link

7speter 51 minutes ago

From what I gather, the Chinese are behind, but a lot of their research amounts to scrappy, clever discoveries in how to use more novel technologies (for Qwen and Deepseek, its mixture of expert models, that can do inference using a portion of the model at a time). The chinese also distill information from American models, so there’s that.

The American companies, from my impression don’t involve themselves with such lowly “hacks” because they have so much money to just push forward with doing everything on big heavy models that run on the most cutting edge nvidia chips that they can, the moment, kinda sorta get on demand (I say that in some degree of jest).

link

_0ffh 1 hour ago

I'm afraid I'm even balking at the word "pioneering" in context with US frontier labs. They are probably doing a few new things, right, but they are not blazing any trails for others to follow along, the Chinese are.

link

epolanski 1 hour ago

Chinese papers and techniques have been very influential and copied by US labs.

Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.

link

DivingForGold 20 minutes ago

Sure, in part by "stealing" from American AI companies with Distillation attacks:

https://yipzap.com/anthropic-accuses-alibaba-of-largest-ai-d...

link

pennomi 4 minutes ago

If your moat is “please don’t copy my outputs”, you don’t have a moat. There is no such thing as a distillation “attack”.

link

dakolli 14 minutes ago

Its because our culture worships pieces of paper the government tells us is worth something.

link

IAmGraydon 10 minutes ago

Money is just a physical representation of the ability to get what you want. The problem is not money. It’s the fact that we live in a “me” society.

link

epolanski 1 hour ago

R1 was very influential on US models development.

link

rvz 1 hour ago

Exactly. They did not have to open up their research up and this is what happens when smart researchers are forced to squeeze performance gains out of existing hardware.

They don't have TPUs or access to the latest Vera Rubin GPUs either to get performance gains for free. All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level.

Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.

link

vidarh 1 hour ago

> Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.

It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.

link

yorwba 1 hour ago

Anthropic almost certainly also has optimized software down to the assembly level, considering this take-home interview challenge they published: https://github.com/anthropics/original_performance_takehome/... which is all about instruction-level performance optimizations. That they don't prioritize UI fixes just means they consider other things more important.

link

lelanthran 44 minutes ago

Unlikely: that product is written completely by AI, of which they are not lacking.

More likely is that an AI generated codename is impossible to fix by humans, and SOTA was not able to figure it out until now.

link

lionkor 47 minutes ago

that's pretty silly to use as a measure of what they do internally

link

jmyeet 1 hour ago

Chinese companies (and labs) operate in conjunction with the CCP so whatever they're doing, it's because it's Chinese state policy.

What became clear when DeepSeek came onto the scene was that China was seeking to commoditize LLMs. They consider it an issue of national security not to be beholden to US tech companies when it comes to AI. And I, for one, fully endorse this policy.

Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.

I believe that OpenAI in particular is a bet on a trillion dollar pot of gold that doesn't exist. Google, Microsoft, Amazon and Meta will all be fine. Anthropic is in a far better position than OpenAI (IMHO) but if DeepSeek or some other Chinese open weight model gets as good at coding, they're in real trouble too.

[1]: https://news.ycombinator.com/item?id=48667495

link

anon373839 30 minutes ago

I don’t see how Anthropic is in a better position. They have a slight edge in model quality right at a time when we’re getting a taste of what cheap, “good enough” AI looks like. They don’t own their own compute. And their own arrogance and lies have alienated a huge chunk of their customer base and alerted everyone to the dangers of being dependent on them.

link

jmyeet 2 minutes ago

I personally think not owning their own compute is going to be an advantage.

There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?

Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.

Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.

While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.

IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.

link

tw1984 50 minutes ago

> Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.

anyone with IQ higher than 130 (thus qualified for actual AI R&D) would be questioning something obvious here -

if they are already doing such dodgy stuff with the aim to maximize profits, why would those resellers have large amount of logs with actual American model responses to sell to those AI labs in the first place. shouldn't they just post train & customize some leading Chinese open source models to pretend to be Opus or GPT for the vast majority of their users (as classified by some models) who don't know much about expected Opus behaviours & not skilled enough to tell the differences?

that is actually the interesting bit not covered in your censored version of the story line, it is also what happens on the ground. your censored version of the story implies that those dodgy resellers using stolen credit cards, pooling accounts with stolen IDs and illegally selling very personal logs would somehow be honest enough to spend extra $ to ensure their victims (aka paying users) can actually use real Opus and GPT. LOL

dude, you failed this IQ test miserably.

link

jampekka 38 minutes ago

The galaxy brains in the labs putatively buying the logs wouldn't notice this? Or figure out a structure to prevent this?

link

tw1984 3 minutes ago

resellers wouldn't be trying to sell such junk in the first place. they use faked models to avoid the cost of Opus tokens, not to double dip to scam those with arguably the highest IQ in the country.

link