| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Imnimo 546 days ago
	I think it would cast doubt on the narrative "you could have trained o1 with much less compute, and r1 is proof of that", if it turned out that in order to train r1 in the first place, you had to have access to bunch of outputs from o1. In other words, you had to do the really expensive o1 training in the first place. (with the caveat that all we have right now are accusations that DeepSeek made use of OpenAI data - it might just as well turn out that DeepSeek really did work independently, and you really could have gotten o1-like performance with much less compute)

9 comments

deepGem 546 days ago

From the R1 paper

In this study, we demonstrate that reasoning capabilities can be significantly improved through large-scale reinforcement learning (RL), even without using supervised fine-tuning (SFT) as a cold start. Furthermore, performance can be further enhanced with the inclusion of a small amount of cold-start data

Is this cold start data what OpenAI is claiming their output ? If so what's the big deal ?

Imnimo 546 days ago

DeepSeek claims that the cold-start data is from DeepSeekV3, which is the model that has the $5.5M pricetag. If that data were actually the output of o1 (a model that had a much higher training cost, and its own RL post-training), that would significantly change the narrative of R1's development, and what's possible to build from scratch on a comparable training budget.

TheGeminon 546 days ago

In the paper DeepSeek just says they have ~800k responses that they used for the cold start data on R1, and are very vague about how they got it:

> To collect such data, we have explored several approaches: using few-shot prompting with a long CoT as an example, directly prompting models to generate detailed answers with reflection and verification, gathering DeepSeek-R1-Zero outputs in a readable format, and refining the results through post-processing by human annotators.

Imnimo 546 days ago

My surface-level reading of these two sections is that the 800k samples come from R1-Zero (i.e. "the above RL training") and V3:

>We curate reasoning prompts and generate reasoning trajectories by performing rejection sampling from the checkpoint from the above RL training. In the previous stage, we only included data that could be evaluated using rule-based rewards. However, in this stage, we expand the dataset by incorporating additional data, some of which use a generative reward model by feeding the ground-truth and model predictions into DeepSeek-V3 for judgment.

>For non-reasoning data, such as writing, factual QA, self-cognition, and translation, we adopt the DeepSeek-V3 pipeline and reuse portions of the SFT dataset of DeepSeek-V3. For certain non-reasoning tasks, we call DeepSeek-V3 to generate a potential chain-of-thought before answering the question by prompting.

The non-reasoning portion of the DeepSeek-V3 dataset is described as:

>For non-reasoning data, such as creative writing, role-play, and simple question answering, we utilize DeepSeek-V2.5 to generate responses and enlist human annotators to verify the accuracy and correctness of the data.

I think if we were to take them at their word on all this, it would imply there is no specific OpenAI data in their pipeline (other than perhaps their pretraining corpus containing some incidental ChatGPT outputs that are posted on the web). I guess it's unclear where they got the "reasoning prompts" and corresponding answers, so you could sneak in some OpenAI data there?

deepGem 546 days ago

That's what I am gathering as well. Where is OpenAI going to have substantial proof to claim that their outputs were used ?

The reasoning prompts and answers for SFT from V3 you mean ? No idea. For that matter you have no idea where OpenAI got this data from either. If they open this can of worms, their can of worms will be opened as well.

IAmGraydon 546 days ago

>Where is OpenAI going to have substantial proof to claim that their outputs were used ?

I assume in their API logs.

rekttrader 546 days ago

Shibboleths in output data

joe_the_user 546 days ago

It's like the claim "they showed anyone create a powerful from scratch" becomes "false yet true".

Maybe they needed OpenAI for their process. But now that their model is open source, anyone can use that as their cold start and spend the same amount.

"From scratch" is a moving target. No one who makes their model with massive data from the net is really doing anything from scratch.

bmicraft 546 days ago

Yeah, but that kills the implied hope of building a better model for cheaper. Like this you'll always have a ceiling of being a bit worse then the openai models.

roenxi 546 days ago

The logic doesn't exactly hold, it is like saying that a student is limited by their teachers. It is certainly possible that a bad teacher will hold the student back, but ultimately a student can lag or improve on the teacher without only a little extra stimulus.

They probably would need some other source of truth than an existing model, but it isn't clear how much additional data is needed.

reassess_blind 546 days ago

Isn't DeepSeek a bit better, not worse?

diedyesterday 546 days ago

Don't forget that this model probably has far less params than o1 or even 4o. This is a compression/distillation, which means it frees up so much compute resources to build models much powerful than o1. At least this allows further scaling compute-wise (if not in the amount of, non-synthetic, source material available for training).

Loic 546 days ago

Not for me. As I build a chemical factory, I do not reinvent everything.

They are using the current SOTA tools and models to build new models for cheaper.

vlovich123 546 days ago

If R1 were better than O1, yes you would be right. But the reporting I’ve seen is that it’s almost as good. Being able to copy cutting edge models won’t advance the state of the art in terms of intelligence. They have made improvements in other area, but if they reused O1 to train their model, that would be effectively a ctrl-c / ctrl-v strictly in terms of task performance.

unclebucknasty 546 days ago

It's not just about whether competitors can improve on OpenAI's models. It's about whether they can continually create reasonable substitutes for orders of magnitude less investment.

vlovich123 546 days ago

> It's about whether they can continually create reasonable substitutes for orders of magnitude less investment

That just means that the edge you’re able to retain if you invest $1B is nonexistent. It also means there’s a huge disincentive to invest $1B if your reward instantly evaporates. That would normally be fine if the competitor is otherwise able to get to that new level without the $1B. But if it relies on your $1B to then be able to put in $100M in the first place to replicate your investment, it essentially means the market for improvements disappears OR there’s legislation written to ensure competitors aren’t allowed to do that.

This is a tragedy of the commons and we already have historical example for how humans tried to deal with it and all the problems that come with it. The cost of producing a book requires substantial capital but the cost of copying it requires a lot less. Copyright law, however flawed and imperfect, tries to protect the incentive to create in the face of that.

PeterStuer 546 days ago

Strong disagree. Copy/paste would mean they took o1's weights and started finetuning from there. That is ot what happened here at all.

vlovich123 546 days ago

First, there could have been industrial espionage involved so who knows. Ignoring that, you’re missing what I’m saying. Think of it this way - if it requires O1’s input to reach almost the same task performance, then this approach gives you a cheap way to replicate the performance of a leading edge model at a fraction of the cost. It does not give you a way to train something that beats a cutting edge model. Cutting edge models require a lot of R&D & capital expenditure - if they’re just going to be trivially copied after public availability, the response is going to be legislation to keep the incentive there to keep meaningful investments in that area. Otherwise you’re going to have another AI winter where progress shrivels because investment dollars dry up.

That’s why it’s so hard to understand the true cost of training Deepseek whereas it’s a little bit easier for cutting edge models (& even then still difficult).

skinner_ 546 days ago

When you build a new model, there is a spectrum of how you use the old model: 1. taking the weights, 2. training on the logits, 3. training on model output, 4. training from scratch. We don't know how much advantage #3 gives. It might be the case that with enough output from the old model, it is almost as useful as taking the weights.

powerapple 546 days ago

I lean on the idea that R1-Zero was trained from cold start, at the same time, they have tried many things including using OpenAI APIs. These things can happen in parallel.

manquer 546 days ago

> you had to do the really expensive o1 training in the first place

It is no better for OpenAI in this scenario either, any competitor can easily copy their expensive training without spending the same, i.e. there is a second mover advantage and no economic incentive to be the first one.

To put it another way, the $500 Billion Stargate investment will be worth just $5Billion once the models become available for consumption, because it only will take that much to replicate the same outcomes with new techniques even if the cold start needed o1 output for RL.

hattmall 546 days ago

Shouldn't OpenAI be able to rather easily detect such usage?

hmmm-i-wonder 546 days ago

Now that its been done, is OpenAI needed or can you iterate on DeepSeek only moving forward?

My understanding is this effectively builds on OpenAI's very expensive initial work, provides a "nearly as good as" model for orders of magnitude cheaper to train and run, that also provides a basis to continue building on and improving without openAI, and without human bottlenecks.

That cuts OAI off at the knees in terms of market viability after billions have been spent. If DS can iterate and match the capabilities of the current in-development OAI models in the next year, it may come down to regulatory capture and government intervention to ensure its viability as a company.

manquer 545 days ago

You cannot really have government intervention against open source and weights successfully.

the attempt in cryptography with PGP and export controls made that clear.

Even if DS specifically is banned (and even effectively), a dozen other clean room replications following their published methods will become available.

It is possible this government will ban all “unapproved” LLMs not running at authorized provider[1], saying it is weapon and AGI or skynet or whatever makes powers that sound important, thus establishing the need for control [2], the rest of the world will keep innovating.

—-

[1] Bans just need to work only economically, not at information level i.e organization with liability considerations will not use “unapproved” ones and they are ones who will bulk of the money and that what they need to protect.

[2] if they were smart they could do this positively without the backlash bans would have. By giving protections to compliant models like legal indemnity for for model companies and users without necessarily blocking others

hmmm-i-wonder 545 days ago

I agree they can't really _successfully_ intervene, but I have very high expectations that they will attempt to in some manner.

MrLeap 546 days ago

o1 wouldn't exist without the combined compute of every mind that led to the training data they used in the first place. How many h100 equivalents are the rolling continuum of all of human history?

dchichkov 546 days ago

It should be possible to learn to reason from scratch. And the ability to reason in a long context seems to be very general.

Nevermark 546 days ago

How does one learn reasoning from scratch?

Human reasoning, as it exists today, is the result of tens of thousands of years of intuition slowly distilled down to efficient abstract concepts like "numbers", "zero", "angles", "cause", "effect", "energy", "true", "false", ...

I don't know what reasoning from scratch would look like without training on examples from other reasoning beings. As human children do.

dchichkov 546 days ago

There are examples of learning reasoning from scratch with reinforcement learning.

Emergent tool use from multi-agent interaction is a good example - https://openai.com/index/emergent-tool-use/

ipaddr 546 days ago

Now you are asking for a perfect modeling of the system. Reinforcement learning works by discovering boundaries.

tracker1 546 days ago

Now rediscover all the plants that are and aren't poisonous to most people.

dchichkov 545 days ago

I've suggested that long context should be included into the prompt.

In your particular case the prompt would look something like: <pubmed dump> what are the plants that aren't poisonous to most people?

A general reasoner would recover language and relevant world model from pubmed dump. And then would proceed to reason about it, to perform the task.

It doesn't look like a particularly efficient process.

Davidzheng 546 days ago

Actually i also think it's possible. Start with natural numbers axiom system. Form all valid sentences of increasing length. RL on a model to search for counter example or proofs. This on sufficient computer should produce superhuman math performance (efficiency) even at compute parity

MrLeap 546 days ago

I wonder how much discovery in math happens as a result in lateral thinking epiphanies. IE: A mathematician is trying to solve a problem, their mind is open to inspiration, and something in nature, or their childhood or a book synthesizes with their mental model and gives them the next node in their mental graph that leads to a solution and advancement.

In an axiomatic system, those solutions are checkable, but how discoverable are they when your search space starts from infinity? How much do you lose by disregarding the gritty reality and foam of human experience? It provides inspirational texture that helps mathematicians in the search at least.

Reality is a massive corpus of cause and effect that can be modeled mathematically. I think you're throwing the baby out with the bathwater if you even want to be able to math in a vacuum. Maybe there is a self optimization spider that can crawl up the axioms and solve all of math. I think you'll find that you can generate new math infinitely, and reality grounds it and provides the gravity to direct efforts towards things that are useful, meaningful and interesting to us.

soulofmischief 546 days ago

As I mentioned in a sister comment, Gödel's incompleteness theorems also throw a wrench into things, because you will be able to construct logically consistent "truths" that may not actually exist in reality. At which point, your model of reality becomes decreasingly useful.

At the end of the day, all theory must be empirically verified, and contextually useful reasoning simply cannot develop in a vacuum.

danenania 546 days ago

A question is: what algorithms does the brain use to make these creative lateral leaps? Are they replicable?

Unless the brain is using physics that we don’t understand or can’t replicate, it seems that, at least theoretically, there should be a way to model what it’s doing with silicon and code.

States like inspiration and creativity seem to correlate in an interesting way with ‘temperature’, ‘top p’, and other LLM inputs. By turning up the randomness and accepting a wider range of output, you get more nonsense, but you also potentially get more novel insights and connections. Human creativity seems to work in a somewhat similar way.

kmeisthax 546 days ago

https://en.wikipedia.org/wiki/Monstrous_moonshine#Origin_of_...

iczero 546 days ago

I believe https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_... (Gödel's incompleteness theorems) applies here

hmmm-i-wonder 546 days ago

Dogs are probably the best example I can think of. They learn through experience and clearly reason, but without a complex language to define abstract concepts. Its very basic reasoning, but they do learn and apply that learning.

To your point, experience is the training. Without language/data to represent human experience and knowledge to train a model, how would you give it 'experience'?

Nevermark 545 days ago

And yet dogs, to a very high degree, just learn the same things. At least the same kinds of things, over and over.

They were pre-designed to learn what they always learn. Their minds structured to readily make the same connections as puppies, that dogs have always needed to survive.

Not for real reasoning, which by its nature, does not have a limit.

hmmm-i-wonder 545 days ago

> just learn the same things. At least the same kinds of things, over and over.

Its easy to train the same things to a degree, but its amazing to watch different dogs individually learn and reason through things completely differently, even within a breed or even a litter.

Reasoning ability is always limited by the capacity of the thinker to frame the concepts and interactions. Its always limited by definition, we only push that limit farther than other species, and AGI may eventually push it past our abilities.

soerxpso 546 days ago

There was necessarily a "first reasoning being" who learned reasoning from scratch, and then it's improved from there. Humans needed tens of thousands of years because:

- humans experience reality at a slower pace than AI could theoretically experience a simulated reality

- humans have to transfer knowledge to the next generation every 80 years (in a manner that's very lossy), and around half of each human lifespan is spent learning things that the previous generation already knew

addicted 546 days ago

The idea that there was “necessarily a first reasoning being” is neither obvious nor likely.

Reasoning could very well have originally been an emergent property of a group of beings.

The animal kingdom is full of examples of groups being more intelligent than individuals, including in human animals as of today.

It’s entirely possible that reasoning emerged as a property of a group before it emerged in any individual first.

carlob 546 days ago

I think you are focusing too much on the fact that a being needs to be an individual organism, which is kind of an implementation detail.

What I wonder instead is whether reasoning is a property that is either there or not there, with a sharp boundary of existence.

soerxpso 540 days ago

Whether the first reasoning entity is an individual organism or a group of organisms is completely irrelevant to the original point. If one were to grant that there was in fact a "first reasoning group" rather than a "first reasoning being" the original argument would remain intact.

butlike 546 days ago

Did it kill them? y - must be unsafe n - must be safe

Do this continually through generations until you arrive at modern society.

MrLeap 546 days ago

Creating reasoning from scratch is the same task as creating an apple pie from scratch.

First you must invent the universe.

psychoslave 546 days ago

>First you must invent the universe.

That was the easy part though, figuring out how to handle all the unintended side effects it generated is still an ongoing process. Please sit and relax while we are solving the few incidentals events occurring here and there, rest assured we are putting our best effort to their resolution.

miki123211 546 days ago

It is possible to learn to reason from scratch, that's what R1-0 did, but the resulting chains of thought aren't legible to humans.

To quote DeepSeek directly:

> DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL.

dchichkov 546 days ago

If you look at the benchmarks of the DeepSeek-V3-Base, it is quite capable, even in 0-shot: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base#base-mod... This is not from scratch. These benchmark numbers are an indication that the base model already had a large number of reasoning/LLM tokens in the pre-training set.

On the other hand, my take on it, the ability to do reasoning in a long context is a general capability. And my guess is that it can be bootstrapped from scratch, without having to do training on all of the internet or having to distill models trained on the internet.

cma 546 days ago

> These benchmark numbers are an indication that the base model already had a large number of reasoning/LLM tokens in the pre-training set.

But we already know that is the case: the Deepseek v3 paper says it was posttrained partly with an internal version of R1:

> Reasoning Data. For reasoning-related datasets, including those focused on mathematics, code competition problems, and logic puzzles, we generate the data by leveraging an internal DeepSeek-R1 model. Specifically, while the R1-generated data demonstrates strong accuracy, it suffers from issues such as overthinking, poor formatting, and excessive length. Our objective is to balance the high accuracy of R1-generated reasoning data and the clarity and conciseness of regularly formatted reasoning data.

And deepseekmath did a repeated cycle of this kind of thing mixing in 10% of old previously seen data with new generated data from last gen in a continuous bootstrap.

PeterStuer 546 days ago

Possible? I guess evolution did it over the course of a few billion years. For engineering purposes, starting from the best advanced position seems far more efficient.

soulofmischief 546 days ago

I've been giving this a lot of thought over the last few months. My personal insight is that "reasoning" is simply the application of a probabilistic reasoning manifold on an input in order to transform it into constrained output that serves the stability or evolution of a system.

This manifold is constructed via learning a decontextualized pattern space on a given set of inputs. Given the inherent probabilistic nature of sampling, true reasoning is expressed in terms of probabilities, not axioms. It may be possible to discover axioms by locating fixed points or attractors on the manifold, but ultimately you're looking at a probabilistic manifold constructed from your input set.

But I don't think you can untie this "reasoning" from your input data. It's possible you will find "meta-reasoning", or similar structures found in any sufficiently advanced reasoning manifold, but these highly decontextualized structures might be entirely useless without proper recontextualization, necessitating that a reasoning manifold is trained on input whose patterns follow learnable underlying rules, if the manifold is to be useful for processing input of that kind.

Decontextualization is learning, decomposing aspects of an input into context-agnostic relationships. But recontextualization is the other half of that, knowing how to take highly abstract, sometimes inexpressible, context-agnostic relationships and transform them into useful analysis in novel domains.

This doesn't mean a well-trained model can't reason about input it hasn't encountered before, just that the input needs to be in some way causally connected to the same laws which governed the input the manifold was trained on.

I'm sure we could create a fully generalized reasoning manifold which could handle anything, but I don't see how we possibly get that without first considering and encountering all possible inputs. But these inputs still have to have some form of constraint governed by laws that must be learned through sampling, otherwise you'd just be training on effectively random data.

The other commenter who suggested simply generating all possible sentences and training on internal consistency should probably consider Gödel's incompleteness theorems, and that internal consistency isn't enough to accurately model and interpret the universe. One could construct a thought experiment about an isolated brain in a jar with effectively unlimited neuronal connections, but no sensory connection to the outside world. It's possible, with enough connections, that the likelihood of the brain conceiving of true events it hasn't actually encountered does increase meaningfully. But the brain still has nothing to validate against, and can't simply assume that because something is internally logically consistent, that it must exist or have existed.

vkou 546 days ago

If OpenAi had to account for the cost of producing all the copyrighted material they trained their LLM on, their system would be worth negative trillions of dollars.

Let's just assume that the cost of training can be externalized to other people for free.

fakedang 545 days ago

Even if what OpenAI asserts in the title of this post is true, then their system is worth negative trillions of dollars.

If other players can access that data with relatively less effort, then it's futile trying to train your models and improve upon them, as clearly you don't have an architectural moat, just a training moat.

Kind of like an office scene where an introverted hardworker does all the tedious work, while his extroverted colleague promotes it as his and gains credit.

hmottestad 546 days ago

At the pace that DeepSeek is developing we should expect them to surpass OpenAI in not that long.

The big question really is, are we doing it wrong, could we have created o1 for a fraction of the price. Will o4 cost less to train than o1 did?

The second question is naturally. If we create a smarter LLM, can we use it to create another LLM that is even smarter?

It would have been fantastic if DeepSeek could have come out with an o3 competitor before o3 even became publicly available. That way we would have known for sure that we’re doing it wrong. Cause then either we could have used o1 to train a better AI or we could have just trained in a smarter and cheaper way.

pertymcpert 546 days ago

The whole discussion is about whether or not the second case of using o1 outputs to fine tune R1 is what allowed R1 to become so good. If that's the case then your assertion that DeepSeek will surpass OpenAI doesn't really make sense because they're dependent on a frontier model in order to match, not surpass.

hmottestad 546 days ago

Yeah, that's my point. If they do end up surpassing OpenAI then it would seem likely that they aren't just relying on copying from o1, or whatever model is the frontier model at that time.

cherry_tree 546 days ago

> I think it would cast doubt on the narrative "you could have trained o1 with much less compute, and r1 is proof of that"

Whether or not you could have, you can now.

SpaceManNabs 546 days ago

My question is if deepseek r1 is just a distilled o1, i wonder if you can build a fine tuned r1 through distillation without having to fine tune o1.

zombiwoof 546 days ago

Exactly. They piggybacked of lots of compute and used less. There still is a total sum of a massive amount of compute

cratermoon 546 days ago

OpenAI piggybacked on the whole internet and the catalogued and shared human knowledge therein.

fmbb 546 days ago

That’s a lot of watt hours!

PeterStuer 546 days ago

And lets not forget a gazillion hours of human reinforcement by armies of 3rd world mechanical turks.

bitfilped 542 days ago

Except OpenAI hasn't shared anything.

TeMPOraL 546 days ago

Sure. This is fine. Data is still a product, no matter how much businesses would like to turn it into a service.

The model already embodies the "total sum of a massive amount of compute" used to create it; if it's possible to reuse that embodied compute to create a better model, that's good for the world. Forcing everyone to redo all that compute for themselves is, conversely, bad for the world.

RHSman2 545 days ago

Nothing good for the world in this ai race but your comment is very good.

da_chicken 546 days ago

I mean, yes that's how progress works. Has OpenAI got a patent? If not it's fair game.

We don't make people figure out how to domesticate a cow every time they want a hamburger. Or test hundreds of thousands of filaments before they can have a lightbulb. Inventions, once invented, exist as giants to stand upon. The inventor can either choose to disclose the invention and earn a patent for exclusive rights, or they can try to keep it a secret and hope nobody reverse engineers it.

philipwhiuk 545 days ago

You mean to create an apple pie from scratch you first have to invent the universe?