Hacker News new | ask | show | jobs
by aurareturn 510 days ago
New AI Lab: Trains a model using Deepseek's techniques on 2,000 GPUs

xAI/OpenAI/Anthropic/Google: Trains a model using Deepseek's techniques on 100,000 GPUs

I fail to see how this makes smaller companies competitive.

1 comments

Depends on whether smaller companies can deliver sufficient results cheaper than the larger ones. There are some indications that suggest that there are diminishing returns on investing on ever more power.

It's like you don't need a 1000HP supercar to get around town, a 55HP sedan is fine for most folks.

Yep, that is my point. If the large scale LLMs are not sufficiently better than the new crop of startups, I suspect the large firms will need to acquire the startups (that would be their response due to a lack of a moat). Its hard to buy everything and to know where to place your bets.
So you’re worried that LLMs have stopped scaling even though the biggest breakthrough from DeepSeek is scaling RF learning without humans?
As I see it: I'm waiting to see improvements in LLM performance. What I see is an improvement in computational efficiency (less hardware needed)

If general LLMs do not show continued performance improvement, then there is a lot of excess HW that needs to be utilized somehow. If LLMs continue to show performance improvement, then the hardware can be used more efficiently.

So how does Deepseek change anything for your view? What you wrote was true before Deepseeek and their non-human RF breakthrough.
Here are my thoughts, I am a little removed from some of these fields so you will not hurt my feelings if you want to be blunt.

Ok, “What I wrote was true before Deepseek and their non-human reinforcement learning Breakthrough”.

Right, I did make a general statement that could be applied pre- and post-Deepseek. I think I get your point. But, you are stating: that “So you’re worried that LLMs have stopped scaling even though the biggest breakthrough from DeepSeek is scaling RF learning without humans?” I’m not worried about it, but I am waiting to see continued LLM performance improvement due to HW scaling as opposed to algorithmic improvements. This seems to happen in industries, like in weather forecasting, model is developed, sucks up all resources, new model is developed, HPC company goes bankrupt, new super computer purchased, new model rolled out, new performance gains, sucks up all computer resources, rinse and repeat. But it takes a long time to release a new upgrade of a model. Now, regarding algorithmic improvements, the way I think about it is to break it down into two areas: a) throughput/efficiency improvement (faster/more_efficient) and b) performance improvement (better predictions). I’m not following this as close as I would like, but it seems Deepseek is more aligned with faster/more_efficient. My statement I wrote, I think what has changed is the realization that, as applied to LLM, investors have been neglecting the efficiency side of the problem. It seems many of the promises of performance around the corner are more investment/funding driven. The other option, of focusing on efficiency would leave more stakeholders with less wealth, so they have a nature bias to promote this technology.

So, with the Deepseek approach/revelation, my thoughts are:

Before Deepseek announcement: Established/large LLM producers: Resources: More hardware, data centers, power distribution, tax credits. Challenges: making a profit, establishing a true moat (but not publicly recognized) Moat: Significant cost challenges to new market entrants limited who could compete. New_entrant/small LLM producers: Resources: License large-scale LLMs, tune them for bespoke requests. Moat: Significant cost challenges to new market entrants limited who could compete. HW producers NVIDIA is dominant with CUDA and established HW products

After Deepseek announcement: Established LLM producers: Called into question the excess resources: More hardware, data centers, power distribution, tax credits. Need to recognize and acquire new entrants that may pose a down-stream challenge. This will be tough as many companies will spring up.

New_entrant/small LLM producers: Resources: New companies should sprout up as barrier to entry is reduced. Moat: Small vendors can specialize, or build business to be acquired by large LLM vendor. HW producers NVIDIA is dominant with CUDA and established HW products Increased GPU demand and high margins of established HW players will bring in new market entrants (competition). Numerous small LLM producers will increase HW sales. Margins are high with NVIDIA, so other HW suppliers will see this as an opportunity to enter the market. Also, since NVIDIA margins are so high, large-scale LLM vendors will welcome competition.