Hacker News new | ask | show | jobs
by alecco 195 days ago
SemiAnalysis said it last week and AFAIK it wasn't denied.

https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...

4 comments

The SemiAnalysis article that you linked to stated:

"OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome."

Given the overall quality of the article, that is an uncharacteristically convoluted sentence. At the risk of stating the obvious, "that was broadly deployed" (or not) is contingent on many factors, most of which are not of the GPU vs. TPU technical variety.

My reading in between the lines is OpenAI's "GPT-5" is really a GPT-4 generation model. And this is aligned with it being unimpressive. Not the promised leap forward Altman promised.
The only real change I noticed is it self censoring more than GPT-4.
From what I can tell they just removed the psychosis component that was always telling you to be right.
This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.

The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.

Sure they chose to not serve the large base models anymore for cost reasons.

But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.

In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.

Big model => distill => RL

Makes the most theoretical sense for training now days for efficient spending.

So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.

My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models
Cutoff dates seem to be Oct 2024 for GPT-4.5, and Jan 2025 for the Gemini models.

It kind of explains a coding issue I had with tradingview who update their pinescript thing quite frequently. ChatGPT seemed to have issues with v4 vs v5.

This is a really great breakdown. With TPUs seemingly more efficient and costing less overall, how does this play for Nvidia? What's to stop them from entering the TPU race with their $5 trillion valuation?
As others mentioned, 5T isn't money available to NVDA. It could leverage that to buy a TPU company in an all stock deal though.

The bigger issue is that entering a 'race' implies a race to the bottom.

I've noted this before, but one of NVDA's biggest risks is that its primary customers are also technical, also make hardware, also have money, and clearly see NVDA's margin (70% gross!!, 50%+ profit) as something they want to eliminate. Google was first to get there (not a surprise), but Meta is also working on its own hardware along with Amazon.

This isn't a doom post for NVDA the company, but its stock price is riding a knifes edge. Any margin or growth contraction will not be a good day for their stock or the S&P.

Making the hardware is actually the easy part. Everyone and their uncle who had some cash have tried by now: Microsoft, Meta, Tesla, Huawei, Amazon, Intel - the list goes on and on. But Nvidia is not a chip company. Huang himself said they are mostly a software company. And that is how they were able to build a gigantic moat. Because noone else has even come close on the software side. Google is the only one who has had some success on this side, because they also spent tons of money and time on software refinement by now, while all the other chips vanished into obscurity.
Are you saying that Google, Meta, Amazon, etc... can't do software? It's the bread and butter of these companies. The CUDA moat is important to hold off the likes of AMD, but hardware like TPUs for internal use or other big software makers is not a big hurdle.

Of course Huang will lean on the software being key because he sees the hardware competition catching up.

Essentially, yes, they haven’t done deep software. Netflix probably comes closest amongst FAANG.

Google, Meta, Amazon do “shallow and broad” software. They are quite fast at capturing new markets swiftly, they frequently repackage OpenSource core and add the large amount of business logic to make it work, but essentially follow the market cycles - they hire and layoff on a few year cycle, and the people who work there typically also will jump around industries due to both transferable skills and relatively competitive competitors.

NVDA is roughly in the same bucket as HFT vendors. They retain talent on a 5-10y timescales. They build software stacks that range from complex kernel drivers and hardware simulators all the way to optimizing compilers and acceleration libraries.

This means they can build more integrated, more optimal and more coherent solutions. Just like Tesla can build a more integrated vehicle than Ford.

I have deep respect for cuda and Nvidia engineering. However, the arguments above seem to totally ignore Google Search indexing and query software stack. They are the king of distributed software and also hardware that scales. That is way TPUs are a thing now and they can compute with Nvidia where AMD failed. Distributed software is the bread and butter of Google with their multi-decade investment from day zero out of necessity. When you have to update an index of an evolving set of billions of documents daily and do that online while keeping subsecond query capability across the globe, that should teach you a few things about deep software stacks.
These companies innovate in all of those areas and direct those resources towards building hyper-scale custom infrastructure, including CPU, TPU, GPU, and custom networking hardware for the largest cloud systems, and conduct research and development on new compilers and operating system components to exploit them.

They're building it for themselves and employ world-class experts across the entire stack.

How can NVIDIA develop "more integrated" solutions when they are primarily building for these companies, as well as many others?

Examples of these companies doing things you mention as being somehow unique to or characteristic of NVIDIA:

Complex kernel drivers or modules:

- AWS: Nitro, ENA/EFA, Firecracker, NKI, bottlerocket

- Google: gasket/apex, gve, binder

- Meta: Katran, bpfilter, cgroup2, oomd, btrfs

Hardware simulators:

- AWS: Neuron, Annapurna builds simulations for nitro, graviton, inferentia and validates aws instances built for EDA services

- Google: Goldfish, Ranchu, Cuttlefish

- Meta: Arcadia, MTIA, CFD for thermal management

Optimizing Compilers:

- Amazon: NNVM, Neo-AI

- Google: MLIR, XLA, IREE

- Meta: Glow, Triton, LLM Compiler

Acceleration Libraries:

- Amazon: NeuronX, aws-ofi-nccl

- Google: Jax, TF

- Meta: FBGEMM, QNNPACK

You're suggesting Waymo isn't deep software? Or Tensorflow? Or Android? The Go programming language? Or MapReduce, AlphaGo, Kubernetes, the transformer, Chrome/Chromium or Gvisor?

You must have an amazing CV to think these are shallow projects.

Well put. I haven’t thought about it like that.
But the first example sigmoid10 gave of a company that can't do software was Microsoft.
Huang said that many years ago, long before ChatGPT or the current AI hype were a thing. In that interview he said that their costs for software R&D and support are equal or even bigger than their hardware side. They've also been hiring top SWE talent for almost two decades now. None of the other companies have spent even close to this much time and money on GPU software, at least until LLMs became insanely popular. So I'd be surprised to see them catch up anytime soon.
If CUDA were as trivial to replicate as you say then Nvidia wouldn’t be what it is today.
CUDA is not hard to replicate, but the network effects make it very hard to break trough with new product. Just like with everything when network effeft applies.
Meta makes websites and apps. Historically, they haven't succeeded at lower-level development. A somewhat recent example was when they tried to make a custom OS for their VR headsets, completely failed, and had to continue using Android.
You're generalizing a failure at delivering one consumer solution and ignoring the successful infrastructure research and development that occurs behind the scenes.

Meta builds hardware from chip to cluster to datacenter scale, and drives research into simulation at every scale, all the way to CFD simulation of datacenter thermal management.

Remind me which company originated PyTorch?
Genuine question: given LLMs' inexorable commoditization of software, how soon before NVDA's CUDA moat is breached too? Is CUDA somehow fundamentally different from other kinds of software or firmware?
Current Gen LLMs are not breaching the moat yet.
Yeah they are. llama.cpp has had good performance on cpu, amd, and apple metal for at least a year now.
Nvidia has everything they need to build the most advanced GPU Chip in the world and mass produce it.

Everything.

They can easily just do this for more optimized Chips.

"easily" in sense of that wouldn't require that much investment. Nvidia knows how to invest and has done this for a long time. Their Ominiverse or robots platform isaac are all epxensive. Nvidia has 10x more software engineers than AMD

They still go to TSMC for fab, and so does everyone else.
For sure. But they also have high volumne and know how to do everything.

Also certain companies normally don't like to do things themselves if they don't have to.

Nonetheless nvidia is were it is because it has cude and an ecoysystem. Everyone uses this ecosystem and then you just run that stuff on the bigger version of the same ecosystem.

> What's to stop them from entering the TPU race with their $5 trillion valuation?

Valuation isn’t available money; they'd have to raise more money in the current, probably tighter for them, investment environment to enter the TPU race, since the money they have already raised that that valuation is based on is already needed to provide runway for what they are already doing without putting money into the TPU race

Nvidia is already in the TPU race aren't they? This is exactly what the tensor cores on their current products are supposed to do, but they're just more heterogeneous GPU based architectures and exist with CUDA cores etc. on the same die. I think it should be within their capability to make a device which devotes an even higher ratio of transistors to tensor processing.
$5 trillion valuation doesn't mean it has $5 trillion cash in pocket -- so "it depends"
If you look at the history how GPUs evolved:

1. there had be fixed function hardware for certain graphics stages

2. Programmable massively parallel hardware took over. Nvidia was at the forefront of this.

TPUs seem to me similar to fixed function hardware. For Nvidia it's a step backwards and even though they go into this direction recently I can't see them go all the way.

Otherwise you don't need cuda, but hardware guy's that write verilog or vhdl. They don't have that much of an edge there.

Why dig for gold when you are the gold standard for the shovel already?
That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.
Semi analysis is great, they typically do semiconductors but reporting is top notch.
Wow, that was a good article. So much detail from financial to optical linking to build various data flow topologies. Makes me less aghast at the $10M salaries for the masters of these techniques.
This article about them got published just yesterday... https://news.ycombinator.com/item?id=46124883

There's a lot of misleading information in what they publish, plagiarism, and I believe some information that wouldn't be possible to get without breaking NDAs

> I believe some information that wouldn't be possible to get without breaking NDAs

…why would I care about this in the slightest?

Dylan Patel founded Semianalysis and he has a great interview with Satya Nadella on Dwarkesh Patel's podcast.
Semianalysis is great, def recommend following
Dylan Patel joined Dwarkesh recently to interview Satya Nadella: https://www.dwarkesh.com/p/satya-nadella-2
And this is relevant how? That interview is 1.5 hours, not something you just casually drop a link to and say "here, listen to this to even understand what point I was trying to make"
Sorry, this was meant to be a reply to this comment: https://news.ycombinator.com/item?id=46127942

I was trying to make the point that SemiAnalysis is semi-famous.

The video is interesting showing microsoft's latest data center and Nadella talking about them. I prefered the youtube version https://youtu.be/8-boBsWcr5A
You can now ask Gemini, about a video. Very useful!
I have a few lines of "download subtitles with yt-dlp", "remove the VTT crap", and "shove it into llm with a summarization prompt and/or my question appended", but I mostly use Gemini for that now. (And I use it for basically nothing else, oddly enough. They just have the monopoly on access to YouTube transcripts ;)
<insert link to 2 hour long YouTube video>

That's my reply. I assume everyone who wants to know my point has access to a LLM that can summarize videos.

Is this how internet communication is supposed to be now?