Hacker News new | ask | show | jobs
by binkHN 195 days ago
This is a really great breakdown. With TPUs seemingly more efficient and costing less overall, how does this play for Nvidia? What's to stop them from entering the TPU race with their $5 trillion valuation?
6 comments

As others mentioned, 5T isn't money available to NVDA. It could leverage that to buy a TPU company in an all stock deal though.

The bigger issue is that entering a 'race' implies a race to the bottom.

I've noted this before, but one of NVDA's biggest risks is that its primary customers are also technical, also make hardware, also have money, and clearly see NVDA's margin (70% gross!!, 50%+ profit) as something they want to eliminate. Google was first to get there (not a surprise), but Meta is also working on its own hardware along with Amazon.

This isn't a doom post for NVDA the company, but its stock price is riding a knifes edge. Any margin or growth contraction will not be a good day for their stock or the S&P.

Making the hardware is actually the easy part. Everyone and their uncle who had some cash have tried by now: Microsoft, Meta, Tesla, Huawei, Amazon, Intel - the list goes on and on. But Nvidia is not a chip company. Huang himself said they are mostly a software company. And that is how they were able to build a gigantic moat. Because noone else has even come close on the software side. Google is the only one who has had some success on this side, because they also spent tons of money and time on software refinement by now, while all the other chips vanished into obscurity.
Are you saying that Google, Meta, Amazon, etc... can't do software? It's the bread and butter of these companies. The CUDA moat is important to hold off the likes of AMD, but hardware like TPUs for internal use or other big software makers is not a big hurdle.

Of course Huang will lean on the software being key because he sees the hardware competition catching up.

Essentially, yes, they haven’t done deep software. Netflix probably comes closest amongst FAANG.

Google, Meta, Amazon do “shallow and broad” software. They are quite fast at capturing new markets swiftly, they frequently repackage OpenSource core and add the large amount of business logic to make it work, but essentially follow the market cycles - they hire and layoff on a few year cycle, and the people who work there typically also will jump around industries due to both transferable skills and relatively competitive competitors.

NVDA is roughly in the same bucket as HFT vendors. They retain talent on a 5-10y timescales. They build software stacks that range from complex kernel drivers and hardware simulators all the way to optimizing compilers and acceleration libraries.

This means they can build more integrated, more optimal and more coherent solutions. Just like Tesla can build a more integrated vehicle than Ford.

I have deep respect for cuda and Nvidia engineering. However, the arguments above seem to totally ignore Google Search indexing and query software stack. They are the king of distributed software and also hardware that scales. That is way TPUs are a thing now and they can compute with Nvidia where AMD failed. Distributed software is the bread and butter of Google with their multi-decade investment from day zero out of necessity. When you have to update an index of an evolving set of billions of documents daily and do that online while keeping subsecond query capability across the globe, that should teach you a few things about deep software stacks.
These companies innovate in all of those areas and direct those resources towards building hyper-scale custom infrastructure, including CPU, TPU, GPU, and custom networking hardware for the largest cloud systems, and conduct research and development on new compilers and operating system components to exploit them.

They're building it for themselves and employ world-class experts across the entire stack.

How can NVIDIA develop "more integrated" solutions when they are primarily building for these companies, as well as many others?

Examples of these companies doing things you mention as being somehow unique to or characteristic of NVIDIA:

Complex kernel drivers or modules:

- AWS: Nitro, ENA/EFA, Firecracker, NKI, bottlerocket

- Google: gasket/apex, gve, binder

- Meta: Katran, bpfilter, cgroup2, oomd, btrfs

Hardware simulators:

- AWS: Neuron, Annapurna builds simulations for nitro, graviton, inferentia and validates aws instances built for EDA services

- Google: Goldfish, Ranchu, Cuttlefish

- Meta: Arcadia, MTIA, CFD for thermal management

Optimizing Compilers:

- Amazon: NNVM, Neo-AI

- Google: MLIR, XLA, IREE

- Meta: Glow, Triton, LLM Compiler

Acceleration Libraries:

- Amazon: NeuronX, aws-ofi-nccl

- Google: Jax, TF

- Meta: FBGEMM, QNNPACK

You're suggesting Waymo isn't deep software? Or Tensorflow? Or Android? The Go programming language? Or MapReduce, AlphaGo, Kubernetes, the transformer, Chrome/Chromium or Gvisor?

You must have an amazing CV to think these are shallow projects.

No, I just realize these for what they are - reasonable projects at the exploitation (rather than exploration) stage of any industry.

I’d say I have an average CV in the EECS world, but also relatively humble perspective of what is and isn’t bleeding edge. And as the industry expands, the volume „inside” the bleeding edge is exploitation, while the surface is the exploration.

Waymo? Maybe; but that’s acquisition and they haven’t done much deep work since. Tensorflow is a handy and very useful DSL, but one that is shallow (builds heavily on CUDA and TPUs etc); Android is another acquisition, and rather incremental growth since; Go is a nth C-like language (so neither Dennis Richie nor Bjarne Stroustrup level work); MapReduce is a darn common concept in HPC (SGI had libraries for it in the 1990s) and implementation was pretty average. AlphaGo - another acquisition, and not much deep work since; Kubernetes is a layer over Linux Namespaces to solve - well - shallow and broad problems; Chrome/Chromium is the 4th major browser that reached dominance and essentially anyone with a 1B to spare can build one.. gVisor is another thin, shallow layer.

What I mean by deep software, is a product that requires 5-10y of work before it is useful, that touches multiple layers of software stack (ideally all from hardware to application) etc. But these types of jobs are relatively rare in the 2020s software world (pretty common in robotics and new space) - they were common in the 1990s where I got my calibration values ;) Netscape and Palm Pilot was a „whoa”. Chromium and Android are evolutions.

Well put. I haven’t thought about it like that.
But the first example sigmoid10 gave of a company that can't do software was Microsoft.
Yeah I'm not convinced Microsoft can do software anymore. I think they're a shambling mess of a zombie software company with enough market entropy to keep going for a long time.
Huang said that many years ago, long before ChatGPT or the current AI hype were a thing. In that interview he said that their costs for software R&D and support are equal or even bigger than their hardware side. They've also been hiring top SWE talent for almost two decades now. None of the other companies have spent even close to this much time and money on GPU software, at least until LLMs became insanely popular. So I'd be surprised to see them catch up anytime soon.
If CUDA were as trivial to replicate as you say then Nvidia wouldn’t be what it is today.
CUDA is not hard to replicate, but the network effects make it very hard to break trough with new product. Just like with everything when network effeft applies.
Meta makes websites and apps. Historically, they haven't succeeded at lower-level development. A somewhat recent example was when they tried to make a custom OS for their VR headsets, completely failed, and had to continue using Android.
You're generalizing a failure at delivering one consumer solution and ignoring the successful infrastructure research and development that occurs behind the scenes.

Meta builds hardware from chip to cluster to datacenter scale, and drives research into simulation at every scale, all the way to CFD simulation of datacenter thermal management.

More than one failure. They had a project to make a custom chip for model training a few years ago, and they scrapped it. Now they have another one, which entered testing in March. I don't think it's going well, because testing should have wrapped up recently, right before the news that they're in serious talks to buy a lot of TPUs from Google. On the other side of the stack, Llama 4 was a disaster and they haven't shipped anything since.

They have the money and talent to do it. As you point out, they do have major successes in areas that take real engineering. But they also have a lot of failures. It will depend how the internal politics play out, I imagine.

Remind me which company originated PyTorch?
Remind me that PyTorch is not a GPU driver.
Genuine question: given LLMs' inexorable commoditization of software, how soon before NVDA's CUDA moat is breached too? Is CUDA somehow fundamentally different from other kinds of software or firmware?
Current Gen LLMs are not breaching the moat yet.
Yeah they are. llama.cpp has had good performance on cpu, amd, and apple metal for at least a year now.
Thw hardware is not the issue. It's the model architectures leading to cascading errors
Nvidia has everything they need to build the most advanced GPU Chip in the world and mass produce it.

Everything.

They can easily just do this for more optimized Chips.

"easily" in sense of that wouldn't require that much investment. Nvidia knows how to invest and has done this for a long time. Their Ominiverse or robots platform isaac are all epxensive. Nvidia has 10x more software engineers than AMD

They still go to TSMC for fab, and so does everyone else.
For sure. But they also have high volumne and know how to do everything.

Also certain companies normally don't like to do things themselves if they don't have to.

Nonetheless nvidia is were it is because it has cude and an ecoysystem. Everyone uses this ecosystem and then you just run that stuff on the bigger version of the same ecosystem.

> What's to stop them from entering the TPU race with their $5 trillion valuation?

Valuation isn’t available money; they'd have to raise more money in the current, probably tighter for them, investment environment to enter the TPU race, since the money they have already raised that that valuation is based on is already needed to provide runway for what they are already doing without putting money into the TPU race

Nvidia is already in the TPU race aren't they? This is exactly what the tensor cores on their current products are supposed to do, but they're just more heterogeneous GPU based architectures and exist with CUDA cores etc. on the same die. I think it should be within their capability to make a device which devotes an even higher ratio of transistors to tensor processing.
$5 trillion valuation doesn't mean it has $5 trillion cash in pocket -- so "it depends"
If you look at the history how GPUs evolved:

1. there had be fixed function hardware for certain graphics stages

2. Programmable massively parallel hardware took over. Nvidia was at the forefront of this.

TPUs seem to me similar to fixed function hardware. For Nvidia it's a step backwards and even though they go into this direction recently I can't see them go all the way.

Otherwise you don't need cuda, but hardware guy's that write verilog or vhdl. They don't have that much of an edge there.

Why dig for gold when you are the gold standard for the shovel already?