Hacker News new | ask | show | jobs
by cocoggu 502 days ago
Can you please explain to me why AI chips don't matters anymore because of DeepSeek? I thought it was just a better model, but perhaps I didn't get it?
2 comments

Deepseek used older generation chips and developed a model that takes significantly less compute. Making having access to tons of the latest nvidia hardware unnecessary.
This does not make sense. If R1 scales similarly to other GPTs, throwing 100x more compute at it will produce an even stronger model.
Being forced to live with more HW restrictions usually results in more reliance on SW creativity and better optimizations instead of lazy developers bloating SW to fill all available resources.

Just like how it's no surprise that websites developed where everyone has the latest and grates fully loaded M silicon MacBooks also sufferer from horrible lack of optimizations because "it works on my machine" while being a stuttery mess everywhere else.

Websites and the like are a different world from ML training, where devs seem to be more performance-conscious. But there's a weird reliance on CUDA because devs (rightfully) don't trust the alternatives.
Yeah, seen the same with PC gaming. Minimum specs absolutely exploding for no real reason other than the fact most gamers were buying the latest top tier cards. Then the Steam Deck came out and devs are forced to consider the fact that a 2D pixel art game shouldn't be lagging out on a SoC capable of producing stunning 3D graphics in a properly optimized game.
Well, the Steam Deck's release coincided with the popularity of reconstruction techniques. The Deck didn't "force" devs to consider optimization so much as it just gave them a low-end reconstruction target to play with. Without FSR and XESS, there's no doubt that the Deck would be a solidly last-gen console.

Strictly speaking, a lot of games really shouldn't be playable on the Steam Deck. Baldur's Gate III and Cyberpunk 2077 are both CPU-bound before reaching 60fps and can barely keep their head above 30fps running at 360p internal resolution. The Deck's saving grace is that it can tap into the same dynamic resolution mode that last-gen consoles depend on for consistent framerates.

It's not that they matter anymore.

But there has been a long term suspicion in the AI community that the ultra expensive to compute and very expensive to run humongous LLM approach is a dead end, or at lest fully unnecessary (and as such monetary wise a dead end).

I mean think about it, the target crown jewel of AI was never to find ways to train on insane amounts of data, but to be able to get as good as possible results with only as much data as necessary but no more. Because for a lot of use cases there simply isn't that much data.

And from everything we know the structure of language is not so complex that you need this insane amount of data and model size.

It's just we worked around of problems by throwing more compute and data on it instead of solving them proper. Similar we try to reformulate any little-data use case by reformulating it in a way where we hope to take advantage of the mass "causal text" data modern foundational LLMs where trained on and fine tune and instrument the model using the "little data" of the use case.

But conceptually this is ... sub-par and non desirable. And sure that we made it work with this trickery is quite magnificent.

And sure this huge LLMs do more then encode language, they encode miscellaneous knowledge/data, too.

But a messy, hallucination prone, non properly updateable and potentially outright copyright or privacy law violating encoding of data...

So many systems already do use RAG like approaches to get supply the knowledge in a updateable much more well defined fore and "only" use the LLM to find the right search queries and combine things together into human readable responses.

In turn the moment we have small LLMs which still work well for language structure they likely will very reliable win through a lot of reasons (the ones mentioned above and they are also much cheaper) and that even through they are _way_ more complicated to use then "just prompting a LLM". But most advanced assistants are anyway already way more complicated then "just prompting a LLM".

Or in other words the technical breakthrough anyone (including OpenAI) would like the most (OpenAI: financially, as long as it's an internal secret) is one which eliminates the need for having the latest bleeding edge ML chip tech. And DeepSeek is seen by some as a signal that exactly such a change is going to happen. Also I have heard rumors (which I don't believe) that one reason for OpenAI to go non-open was because they realized that, too. And with cheap to run open models they would lose the competitive benefit of competition not being able to do from scratch training even if they want to.