| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alecco 195 days ago
	SemiAnalysis said it last week and AFAIK it wasn't denied. https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...

4 comments

RossBencina 195 days ago

The SemiAnalysis article that you linked to stated:

"OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome."

Given the overall quality of the article, that is an uncharacteristically convoluted sentence. At the risk of stating the obvious, "that was broadly deployed" (or not) is contingent on many factors, most of which are not of the GPU vs. TPU technical variety.

alecco 195 days ago

My reading in between the lines is OpenAI's "GPT-5" is really a GPT-4 generation model. And this is aligned with it being unimpressive. Not the promised leap forward Altman promised.

aswegs8 195 days ago

The only real change I noticed is it self censoring more than GPT-4.

herbst 195 days ago

From what I can tell they just removed the psychosis component that was always telling you to be right.

nbardy 195 days ago

This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.

The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.

Sure they chose to not serve the large base models anymore for cost reasons.

But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.

In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.

Big model => distill => RL

Makes the most theoretical sense for training now days for efficient spending.

So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.

barrell 195 days ago

My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models

tim333 195 days ago

Cutoff dates seem to be Oct 2024 for GPT-4.5, and Jan 2025 for the Gemini models.

It kind of explains a coding issue I had with tradingview who update their pinescript thing quite frequently. ChatGPT seemed to have issues with v4 vs v5.

binkHN 195 days ago

This is a really great breakdown. With TPUs seemingly more efficient and costing less overall, how does this play for Nvidia? What's to stop them from entering the TPU race with their $5 trillion valuation?

matwood 195 days ago

As others mentioned, 5T isn't money available to NVDA. It could leverage that to buy a TPU company in an all stock deal though.

The bigger issue is that entering a 'race' implies a race to the bottom.

I've noted this before, but one of NVDA's biggest risks is that its primary customers are also technical, also make hardware, also have money, and clearly see NVDA's margin (70% gross!!, 50%+ profit) as something they want to eliminate. Google was first to get there (not a surprise), but Meta is also working on its own hardware along with Amazon.

This isn't a doom post for NVDA the company, but its stock price is riding a knifes edge. Any margin or growth contraction will not be a good day for their stock or the S&P.

sigmoid10 195 days ago

Making the hardware is actually the easy part. Everyone and their uncle who had some cash have tried by now: Microsoft, Meta, Tesla, Huawei, Amazon, Intel - the list goes on and on. But Nvidia is not a chip company. Huang himself said they are mostly a software company. And that is how they were able to build a gigantic moat. Because noone else has even come close on the software side. Google is the only one who has had some success on this side, because they also spent tons of money and time on software refinement by now, while all the other chips vanished into obscurity.

matwood 195 days ago

Are you saying that Google, Meta, Amazon, etc... can't do software? It's the bread and butter of these companies. The CUDA moat is important to hold off the likes of AMD, but hardware like TPUs for internal use or other big software makers is not a big hurdle.

Of course Huang will lean on the software being key because he sees the hardware competition catching up.

qdotme 195 days ago

Essentially, yes, they haven’t done deep software. Netflix probably comes closest amongst FAANG.

Google, Meta, Amazon do “shallow and broad” software. They are quite fast at capturing new markets swiftly, they frequently repackage OpenSource core and add the large amount of business logic to make it work, but essentially follow the market cycles - they hire and layoff on a few year cycle, and the people who work there typically also will jump around industries due to both transferable skills and relatively competitive competitors.

NVDA is roughly in the same bucket as HFT vendors. They retain talent on a 5-10y timescales. They build software stacks that range from complex kernel drivers and hardware simulators all the way to optimizing compilers and acceleration libraries.

This means they can build more integrated, more optimal and more coherent solutions. Just like Tesla can build a more integrated vehicle than Ford.

musebox35 195 days ago

I have deep respect for cuda and Nvidia engineering. However, the arguments above seem to totally ignore Google Search indexing and query software stack. They are the king of distributed software and also hardware that scales. That is way TPUs are a thing now and they can compute with Nvidia where AMD failed. Distributed software is the bread and butter of Google with their multi-decade investment from day zero out of necessity. When you have to update an index of an evolving set of billions of documents daily and do that online while keeping subsecond query capability across the globe, that should teach you a few things about deep software stacks.

dumah 193 days ago

These companies innovate in all of those areas and direct those resources towards building hyper-scale custom infrastructure, including CPU, TPU, GPU, and custom networking hardware for the largest cloud systems, and conduct research and development on new compilers and operating system components to exploit them.

They're building it for themselves and employ world-class experts across the entire stack.

How can NVIDIA develop "more integrated" solutions when they are primarily building for these companies, as well as many others?

Examples of these companies doing things you mention as being somehow unique to or characteristic of NVIDIA:

Complex kernel drivers or modules:

- AWS: Nitro, ENA/EFA, Firecracker, NKI, bottlerocket

- Google: gasket/apex, gve, binder

- Meta: Katran, bpfilter, cgroup2, oomd, btrfs

Hardware simulators:

- AWS: Neuron, Annapurna builds simulations for nitro, graviton, inferentia and validates aws instances built for EDA services

- Google: Goldfish, Ranchu, Cuttlefish

- Meta: Arcadia, MTIA, CFD for thermal management

Optimizing Compilers:

- Amazon: NNVM, Neo-AI

- Google: MLIR, XLA, IREE

- Meta: Glow, Triton, LLM Compiler

Acceleration Libraries:

- Amazon: NeuronX, aws-ofi-nccl

- Google: Jax, TF

- Meta: FBGEMM, QNNPACK

bluelightning2k 195 days ago

You're suggesting Waymo isn't deep software? Or Tensorflow? Or Android? The Go programming language? Or MapReduce, AlphaGo, Kubernetes, the transformer, Chrome/Chromium or Gvisor?

You must have an amazing CV to think these are shallow projects.

danielscrubs 195 days ago

Well put. I haven’t thought about it like that.

thaumasiotes 195 days ago

But the first example sigmoid10 gave of a company that can't do software was Microsoft.

sigmoid10 195 days ago

Huang said that many years ago, long before ChatGPT or the current AI hype were a thing. In that interview he said that their costs for software R&D and support are equal or even bigger than their hardware side. They've also been hiring top SWE talent for almost two decades now. None of the other companies have spent even close to this much time and money on GPU software, at least until LLMs became insanely popular. So I'd be surprised to see them catch up anytime soon.

whywhywhywhy 195 days ago

If CUDA were as trivial to replicate as you say then Nvidia wouldn’t be what it is today.

whatevaa 194 days ago

CUDA is not hard to replicate, but the network effects make it very hard to break trough with new product. Just like with everything when network effeft applies.

Miraste 195 days ago

Meta makes websites and apps. Historically, they haven't succeeded at lower-level development. A somewhat recent example was when they tried to make a custom OS for their VR headsets, completely failed, and had to continue using Android.

dumah 193 days ago

You're generalizing a failure at delivering one consumer solution and ignoring the successful infrastructure research and development that occurs behind the scenes.

Meta builds hardware from chip to cluster to datacenter scale, and drives research into simulation at every scale, all the way to CFD simulation of datacenter thermal management.

coredog64 195 days ago

Remind me which company originated PyTorch?

sanjayjc 195 days ago

Genuine question: given LLMs' inexorable commoditization of software, how soon before NVDA's CUDA moat is breached too? Is CUDA somehow fundamentally different from other kinds of software or firmware?

tomrod 195 days ago

Current Gen LLMs are not breaching the moat yet.

fzzzy 195 days ago

Yeah they are. llama.cpp has had good performance on cpu, amd, and apple metal for at least a year now.

Glemkloksdjf 195 days ago

Nvidia has everything they need to build the most advanced GPU Chip in the world and mass produce it.

Everything.

They can easily just do this for more optimized Chips.

"easily" in sense of that wouldn't require that much investment. Nvidia knows how to invest and has done this for a long time. Their Ominiverse or robots platform isaac are all epxensive. Nvidia has 10x more software engineers than AMD

farseer 195 days ago

They still go to TSMC for fab, and so does everyone else.

Glemkloksdjf 195 days ago

For sure. But they also have high volumne and know how to do everything.

Also certain companies normally don't like to do things themselves if they don't have to.

Nonetheless nvidia is were it is because it has cude and an ecoysystem. Everyone uses this ecosystem and then you just run that stuff on the bigger version of the same ecosystem.

dragonwriter 195 days ago

> What's to stop them from entering the TPU race with their $5 trillion valuation?

Valuation isn’t available money; they'd have to raise more money in the current, probably tighter for them, investment environment to enter the TPU race, since the money they have already raised that that valuation is based on is already needed to provide runway for what they are already doing without putting money into the TPU race

captainbland 195 days ago

Nvidia is already in the TPU race aren't they? This is exactly what the tensor cores on their current products are supposed to do, but they're just more heterogeneous GPU based architectures and exist with CUDA cores etc. on the same die. I think it should be within their capability to make a device which devotes an even higher ratio of transistors to tensor processing.

sysguest 195 days ago

$5 trillion valuation doesn't mean it has $5 trillion cash in pocket -- so "it depends"

randomNumber7 195 days ago

If you look at the history how GPUs evolved:

1. there had be fixed function hardware for certain graphics stages

2. Programmable massively parallel hardware took over. Nvidia was at the forefront of this.

TPUs seem to me similar to fixed function hardware. For Nvidia it's a step backwards and even though they go into this direction recently I can't see them go all the way.

Otherwise you don't need cuda, but hardware guy's that write verilog or vhdl. They don't have that much of an edge there.

herbst 195 days ago

Why dig for gold when you are the gold standard for the shovel already?

CamperBob2 195 days ago

That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.

seatac76 195 days ago

Semi analysis is great, they typically do semiconductors but reporting is top notch.

lanstin 195 days ago

Wow, that was a good article. So much detail from financial to optical linking to build various data flow topologies. Makes me less aghast at the $10M salaries for the masters of these techniques.

Numerlor 195 days ago

This article about them got published just yesterday... https://news.ycombinator.com/item?id=46124883

There's a lot of misleading information in what they publish, plagiarism, and I believe some information that wouldn't be possible to get without breaking NDAs

girvo 195 days ago

> I believe some information that wouldn't be possible to get without breaking NDAs

…why would I care about this in the slightest?

ipnon 195 days ago

Dylan Patel founded Semianalysis and he has a great interview with Satya Nadella on Dwarkesh Patel's podcast.

CSMastermind 195 days ago

Semianalysis is great, def recommend following

rahimnathwani 195 days ago

Dylan Patel joined Dwarkesh recently to interview Satya Nadella: https://www.dwarkesh.com/p/satya-nadella-2

embedding-shape 195 days ago

And this is relevant how? That interview is 1.5 hours, not something you just casually drop a link to and say "here, listen to this to even understand what point I was trying to make"

rahimnathwani 195 days ago

Sorry, this was meant to be a reply to this comment: https://news.ycombinator.com/item?id=46127942

I was trying to make the point that SemiAnalysis is semi-famous.

tim333 195 days ago

The video is interesting showing microsoft's latest data center and Nadella talking about them. I prefered the youtube version https://youtu.be/8-boBsWcr5A

kovezd 195 days ago

You can now ask Gemini, about a video. Very useful!

andai 195 days ago

I have a few lines of "download subtitles with yt-dlp", "remove the VTT crap", and "shove it into llm with a summarization prompt and/or my question appended", but I mostly use Gemini for that now. (And I use it for basically nothing else, oddly enough. They just have the monopoly on access to YouTube transcripts ;)

embedding-shape 195 days ago

<insert link to 2 hour long YouTube video>

That's my reply. I assume everyone who wants to know my point has access to a LLM that can summarize videos.

Is this how internet communication is supposed to be now?