| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pen2l 503 days ago
	While all of this is true, that DeepSeek wouldn't be here were it not for the research that preceded it notably Google's paper, then Llama, and ChatGPT which they're modeled after, its release still did something profound to their psyche, the motivation and self-actualization this instills to the Chinese. They witnessed the power of their accomplishments: a side-hustle project knocked off an easy trillion. This is only egging them on and will serve to ramp up their efforts even more. Separately, I do think that now that the Chinese leadership saw this, that they have the chops to pull this off and then some, they are probably going to rein in future innovations; they'll likely demand that the big future discoveries remain closed-sourced (or even unannounced/unpublicized).

6 comments

tedivm 503 days ago

OpenAI wouldn't be here without the work that Yann Lecun did at Facebook (back when it was facebook). Science is built on top of science, that's just how things work.

link

wrasee 503 days ago

Yes, but in science you reference your work and credit those who came before you.

Edit: I am not defending OpenAI and we are all enjoying the irony here. But it puts into perspective some of the wilder claims circulating that DeekSeek was able to somehow complete with OpenAI for only $5M, as if on a level playing field.

link

tedivm 503 days ago

OpenAI has been hiding their datasets, and certainly haven't credited me for the data they stole from my website and github repositories. If OpenAI doesn't think they should give attribution to the data they used, it seems weird to require that of others.

Edit: Responding to your edit, Deepseek only claimed that the final training run was $5m, not that the whole process caught that (they even call this out). I think it's important to acknowledge that, even if they did get some training data from OpenAI, this is a remarkable achievement.

link

wrasee 503 days ago

It is a remarkable achievement. But if “some training data from OpenAI” turns out to essentially be a wholesale distillation of their entire model (along with Llama etc) I do think that somewhat dampens the spirit of it.

We don’t know that of course. OpenAI claim to have some evidence and I guess we’ll just have to wait and see how this plays out.

There’s also a substantial difference between training of the entire internet and one that very specifically targets your competitor's products (or any specific work directly).

link

ambicapter 503 days ago

Only weird if you think what OpenAI did should be the norm.

link

wrasee 503 days ago

Right. I think many here are enjoying the Schadenfreude against OpenAI, but that hardly makes it right. It just makes it a race to the bottom.

link

bugglebeetle 503 days ago

Like all those papers with their long lists of citations OpenAI has been releasing?

link

dkjaudyeqooe 503 days ago

That's only in academia. The same thing happens in commerce, only there is no (official) credit given.

link

Filligree 503 days ago

That's $5M for the final training run. Which is an improvement to be sure, but it doesn't include the other training runs -- prototypes, failed runs and so forth.

link

coliveira 503 days ago

It is OpenAI that discredits themselves when they say that each new model is the result of hundreds of USD millions in training. They throw this around as it is a big advantage of their models.

link

nicce 503 days ago

And the cost is based on the imaginary currency that Microsoft has given for them as Azure computing.

link

blackeyeblitzar 503 days ago

Is that really true? If anything OpenAI was dependent on the transformers paper from Google from Ashish Vaswani and others. LeCun has been criticizing LLM architectures for a long time and has been wrong about them for a long time.

link

mv4 503 days ago

That was my impression too. He is considered the inventor of CNN back in 1998. Is there anything more recent that's meaningful?

link

tedivm 503 days ago

I was more referring to this paper from 2015:

https://scholar.google.com/citations?view_op=view_citation&h...

Basically all LLM can trace their origin back to that paper.

This was just a single example though. The whole point is that people build on the work from the past, and that this is normal.

link

esafak 502 days ago

That's just an overview for paper for those new to the field. The transformer architecture has a better claim to being the origin of LLMs.

link

mv4 503 days ago

Thank you for sharing this.

link

blackeyeblitzar 503 days ago

Personally, I have not seen anything from him that is meaningful. OpenAI and Anthropic (itself started by former OpenAI people) of course have built their models without LeCun’s contributions. And for a few years now, LeCun has been giving the same talk anywhere he makes appearances, saying that large language models are a dead end and that other approaches like his JEPA architecture are the future. Meanwhile current LLM architecture has continued to evolve and become very useful. As for the misuse of the term “open source”, I think that really began once he was at Meta, and is a way to use his fame to market Llama and help Meta not look irrelevant.

link

tedivm 502 days ago

They literally cited LeCun in their GPT papers.

link

amelius 503 days ago

By the way, as someone who once did classical image recognition using convolutions, I can't say I was very impressed by the CNN approach, especially since their implementation didn't even use FFTs for efficiency.

link

zbendefy 503 days ago

Also without the "attention is all you need" paper from google

link

nicce 503 days ago

We wouldn't be here discussing if nobody invented internet... nor these models had training data at all.

> Separately, I do think that now that the Chinese leadership saw this, that they have the chops to pull this off and then some, they are probably going to rein in future innovations; they'll likely demand that the big future discoveries remain closed-sourced (or even unannounced/unpublicized).

How do we know that this is not already happening with OpenAI/Meta and the U.S. government at some level? The concept of power is equal, whether we wanted it or not. We don't have to pretend to be "better" all the time.

link

openrisk 503 days ago

> they'll likely demand that the big future discoveries remain closed-sourced

Depends on whether they want these tools to be adopted in the wider world. Rightly or wrongly there is a lot of suspicion in the West and an open source approach builds trust.

link

hn_throwaway_99 503 days ago

> While all of this is true, that DeepSeek wouldn't be here were it not for the research that preceded it (notably Llama), and ChatGPT which they're modeled after...

If the allegation is true (we don't know yet), then what you've written perfectly proves the point everyone is making. ChatGPT wouldn't be here if it weren't for all the research and work that preceded it in terms of tons of scrapable content being available on the Internet, and it's not like OpenAI invented transformers either.

Nobody is accusing DeepSeek of hacking into OpenAI's systems and stealing their content. OpenAI is just saying they scraped them in an "unauthorized" manner. The hypocrisy is laughably striking, but sadly nobody has any shame anymore in this world it seems. Play me the world's tiniest violin for OpenAI.

link

dismalaf 503 days ago

Don't forget all the research that came before OpenAI and ChatGPT...

link

stravant 503 days ago

Yes, and what does preceding research do? Get followed by more research building on it.

link

dylan604 503 days ago

Standing on the shoulders and it's turtles all the way

link