Hacker News new | ask | show | jobs
by just-ok 502 days ago
It’s not better than o1. And given that OpenAI is on the verge of releasing o3, has some “o4” in the pipeline, and Deepseek could only build this because of o1, I don’t think there’s as much competition as people seem to imply.

I’m excited to see models become open, but given the curve of progress we’ve seen, even being “a little” behind is a gap that grows exponentially every day.

5 comments

When the price difference is so high and the performance so close, of course you have a major issue with competition. Let alone the fact this is fully open source.

Most importantly, this is a signal: openAI and META are trying to build a moat using massive hardware investments. Deepseek took the opposite direction and not only does it show that hardware is no moat, it basically makes fool of their multibillion claims. This is massive. If only investors had the brain it takes, we would pop this bubble alread.

Why should the bubble pop when we just got the proof that these models can be much more efficient than we thought?

I mean, sure, no one is going to have a monopoly, and we're going to see a race to the bottom in prices, but on the other hand, the AI revolution is going to come much sooner than expected, and it's going to be on everyone's pocket this year. Isn't that a bullish signal for the economy?

Chances are the investors who put in all that capital would rather invest it in the team that has the ability to make the most of it. Deepseek calls into question whether OpenAI, Anthropic or Google are as world class as everyone thought a few days ago.
It doesn’t call it into question- they’re not. OpenAI has been bleeding researchers since the Anthropic split (and arguably their best ones, given Claude vs GPT-4o). While Google should have all the data in the world to build the best models, they still seem organizationally incapable of leveraging it to the their advantage, as was the case with their inventing Transformers in the first place.
> While Google should have all the data in the world to build the best models

They do have the best models. Two models made by Google share the first place on Chatbot Arena.

[1] https://lmarena.ai/?leaderboard

I'm not sure placing first in Chatbot Arena is proof of anything except being the best at Chatbot Arena, it's been shown that models that format things in a visually more pleasant way tend to win side by side comparisons.

In my experience doing actual work, not side by side comparisons, Claude wins outright as a daily work horse for any and all technical tasks. Chatbot Arena may say Gemini is "better", but my reality of solving actual coding problems says Claude is miles ahead.

I think this is the correct take. There might be a small bubble burst initially after a bunch of US stocks retrace due to uncertainty. But in the long run this should speed up the proliferation of productivity gains unlocked by AI.
I think we should not underestimate one aspect: at the moment, a lot of hype is artificial (and despicable if you ask me). Anthropic says AI can double human lifespan in 10 years time; openAI says they have AGI behind the corner; META keeps insisting on their model being open source when they in fact only release the weights. They think - maybe they are right - that they would not be able to get these massive investments without hyping things a bit but deepseek's performance should call for things to be reviewed.
Based on reports from a16z the US Government likely wants to bifurcate the top-tier tech and bring it into DARPA, with clear rules for how capable anything can be that the public will be able to access.

I consider it unlikely that the new administration is philosophically different with respect to its prioritization of "national security" concerns.

> Anthropic says AI can double human lifespan in 10 years time;

That's not a crazy thing to say, at all.

Lots of AI researchers think that ASI is less than 5 years away.

> deepseek's performance should call for things to be reviewed.

Their investments, maybe, their predictions of AGI? They should be reviewed to be more optimistic.

I am a professor of Neurobiology, I know a thing or two about lifespan research. To claim that human lifespan can be doubled is crazy per se. To claim it can be done in 10 years by a system that does not even exist is even sillier.
But it took the deepseek team a few weeks to replicate something at least close to o1.

If people can replicate 90% of your product in 6 weeks you have competition.

Not only a few weeks, but more importantly, it was cheap.

The moat for these big models were always expected to be capital expenditure for training costing billions. It's why these companies like openAI etc, are spending massively on compute - it's building a bigger moat (or trying to at least).

If it can be shown, which seems to have been, that you could use smarts and make use of compute more efficiently and cheaply, but achieve similar (or even better) results, the hardware moat bouyed by capital is no longer.

i'm actually glad tho. An opensourced version of these weights should ideally spur the type of innovation that stable diffusion did when theirs was released.

o1-preview was released Sep 12, 2024. So DeepSeek team probably had a couple of months.
> Deepseek could only build this because of o1, I don’t think there’s as much competition as people seem to imply

And this is based on what exactly? OpenAI hides the reasoning steps, so training a model on o1 is very likely much more expensive (and much less useful) than just training it directly on a cheaper model.

Because literally before o1, no one is doing COT style test time scaling. It is a new paradigm. The talking point back then, is the LLM hits the wall.

R1's biggest contribution IMO, is R1-Zero, I am fully sold with this they don't need o1's output to be as good. But yeah, o1 is still the herald.

I don't think Chain of Thought in itself was a particularly big deal, honestly. It always seemed like the most obvious way to make AI "work". Just give it some time to think to itself, and then summarize and conclude based on its own responses.

Like, this idea always seemed completely obvious to me, and I figured the only reason why it hadn't been done yet is just because (at the time) models weren't good enough. (So it just caused them to get confused, and it didn't improve results.)

Presumably OpenAI were the first to claim this achievement because they had (at the time) the strongest model (+ enough compute). That doesn't mean COT was a revolutionary idea, because imo it really wasn't. (Again, it was just a matter of having a strong enough model, enough context, enough compute for it to actually work. That's not an academic achievement, just a scaling victory.)

But the longer you allocate tokens to CoT, the better it at solving the problem is a revolutionary idea. And model self correct within its own CoT is first brought out by o1 model.
Chain of Thought was known since 2022 (https://arxiv.org/abs/2201.11903), we just were stuck in a world where we were dumping more data and compute at the training instead of looking at other improvements.
CoT is a common technique, but scaling law of more test time compute on CoT generation, correlates with problem solving performance is from o1.
> even being “a little” behind is a gap that grows exponentially every day

This theory has yet to be demonstrated. As yet, it seems open source just stays behind by about 6-10 months consistently.

> It’s not better than o1.

I thought that too before I used it to do real work.

Yes. It shines with real problems.