Hacker News new | ask | show | jobs
by thanhhaimai 59 days ago
Opinions are my own.

I think the biggest winner of this might be Google. Virtually all the frontier AI labs use TPU. The only one that doesn't use TPU is OpenAI due to the exclusive deal with Microsoft. Given the newly launched Gen 8 TPU this month, it's likely OpenAI will contemplate using TPU too.

14 comments

Many labs use TPUs, but not exclusively. Most labs need more compute than they can get, and if there's TPU capacity, they'll adapt their systems to be able to run partially on TPUs.
Why is AMD not more popular then if labs are so flexibly with giving away CUDA?
people are trying, especially for inference. For training, it’s just too high risk to tank your training I think.

TPUs are at least dogfooded by Google deepmind, no team AFAIK has gotten the AMD stack to train well.

Interesting. Why? My current mental model is that AMD chips are just a bit behind, so, less efficient, but no biggie. Do labs even use CUDA?
This is somewhat out of date (Dec 2024), but gives you some idea of how far behind AMD was then: https://newsletter.semianalysis.com/p/mi300x-vs-h100-vs-h200...

Pull quotes:

AMD’s software experience is riddled with bugs rendering out of the box training with AMD is impossible. We were hopeful that AMD could emerge as a strong competitor to NVIDIA in training workloads, but, as of today, this is unfortunately not the case. The CUDA moat has yet to be crossed by AMD due to AMD’s weaker-than-expected software Quality Assurance (QA) culture and its challenging out of the box experience.

[snip]

> The only reason we have been able to get AMD performance within 75% of H100/H200 performance is because we have been supported by multiple teams at AMD in fixing numerous AMD software bugs. To get AMD to a usable state with somewhat reasonable performance, a giant ~60 command Dockerfile that builds dependencies from source, hand crafted by an AMD principal engineer, was specifically provided for us

[snip]

> AMD hipBLASLt/rocBLAS’s heuristic model picks the wrong algorithm for most shapes out of the box, which is why so much time-consuming tuning is required by the end user.

etc etc. The whole thing is worth reading.

I'm sure it has (and will continue to) improved since then. I hear good things about the Lemonade team (although I think that is mostly inference?)

But the NVidia stack has improved too.

That’s insane. There should be a big team of people at AMD whose whole job is just to dogfood their stuff for training like this. Speaking of which, Amazon is in the same boat, I’m constantly surprised that Amazon is not treating improving Inferentia/Trainium software as an uber-priority. (I work at Amazon)
Anecdotal but over several years with an AMD GPU in my desktop I've tried multiple times to do real AI work and given up every time with the AMD stack.
Yet another reason to doubt claims that ”software is solved”.

Anthropic did retire an interview take-home assignment involving optimising inference on exotic hardware, because Claude could one shot a solution, but that was clearly a whiteboard hypothetical instead of a real system with warts, issues and nuance.

i'm doing inference on a free mi300x instance from AMD right now. not sure if the software stack is just old or what, but here's what i've observed: stuck on an old version of vllm pre-Transformers 5 support. it lacks MoE support for qwen3 models. oss-120b is faaaar slower than it should be.

int8 quantization seems like it's almost supported, but not quite. speeds drop to a fraction of full precision speed and the server seems like it intermittently hangs. int4 quantization not supported. fp8 quantization not supported.

again, maybe AMD is just being lazy with what they've provided, but it's not a great look.

right now the fastest smart model i can run is full precision qwen3-32b. with 120 parallel requests (short context) i'm getting PP @ 4500 tokens/sec and TG @ 1300 tokens/sec

amd gpus compete but they lack the interconnect. NVLink performance is a huge deal for training.
> Do labs even use CUDA?

From the papers I've read and the labs that I have worked in personally, I would say that most scientists developing Deep learning solutions use CUDA for GPU acceleration

What I hear is that getting your network to work on AMD is a huge pain.
Yeah, historically it’s been software that’s limited AMD here. Not surprised to hear that may still be the issue. NVidia’s biggest edge was really CUDA.
I don’t know what’s a chicken and what’s an egg here. But ROCm support is often missing or experimental even in very basic foundational libraries. They need someone else to double down on using their chips and just break the software support out of the limbo.
This is what I've heard on the "street". Building a CUDA-compatible stack for AMD's hardware requires highly-paid SWEs. It's a very niche field, and talent is hard to come by.

But AMD does not want to pay these specialized SWEs the market rate. Their existing SWEs would be up in arms saying, basically, "what are we, chopped liver??", or so the thinking goes.

So AMD is stuck with a shitty software stack which cannot compete with CUDA.

If I were making such decisions, I would just cull the number of existing SWEs down by 50%, and double the pay for remaining ones. And then go out and hire some top talent to build a good software stack.

> highly-laid SWEs

Freudian slip?

even google doesnt only use TPUs.
Google is in a different position to others in that they're the only frontier lab with a cloud infra business. It obviously makes sense to sell GPUs on cloud infra as people want to rent them. In that respect Google buys a ton of GPUs to rent out.

What's unclear to me is how much Google uses GPUs for their own stuff. Yes Gemini runs on GPUs now, so that Google can sell Gemini on-prem boxes (recent release announced last week), but is any training or inference for Gemini really happening on GPUs? This is unclear to me. I'd have guessed not given that I thought TPUs were much cheaper to operate, but maybe I'm wrong.

Caveat, I work at Google, but not on anything to do with this. I'm only going on what's in the press for this stuff.

> Gemini on-prem boxes (recent release announced last week)

Do you have any more information on this? I only found this article about it: https://venturebeat.com/technology/googles-gemini-can-now-ru...

It mentions that Gemini can run on eight NVIDIA GPUs, but not which GPU and which Gemini model. Either way, this puts an upper bound of 288 * 8 = 2304 GB on the size of the Gemini model, which as far as I know has been a secret until now.

I have no more info, but wouldn't be able to share if I did. The public info like that article is what I'm going off.
I have most likely outdated info, I left Google Research 4y ago. Back then, available TPU instances were plenty and GPU scarce. Nobody wanted to mess with an immature crashing compiler and very steep performance cliffs (performance was excellent only if you stayed within the guardrails, and being outside was supported and not even resulting in a warning - as it was so common in code). But I believe most of it has changed for the better for TPUs.
And almost by happenstance Apple. Turns out they have a great platform for inference and torched almost nothing comparatively on Siri. The Apple/Gemini deal is interesting, Google continues to demonstrate their willingness to degrade their experience on Apple to try and force people to switch.
If you do the math (I did), in 2 years, open source models that you can run on a future MacBook Pro will be as capable as the frontier cloud models are today. Memory bandwidth is growing rapidly, as is the die area dedicated to the neural cores. And all the while, we have the silicon getting more power efficient and increasingly dense (as it always does). These hardware improvements are coming along as the open source models improve through research advancements. And while the cloud models will always be better (because they can make use of as much power as they want to - up in the cloud), what matters to most of us is whether a model can do a meaningful share of knowledge work for us. At the same time, energy consumption to run cloud infrastructure is out-pacing the creation of new energy supply, which is a problem not easily solved. I believe scarcity of energy will increasingly drive frontier labs toward power efficiency, which necessarily implies that the Pareto frontier of performance between cloud and local execution will narrow.
A Opus 4.7/Gpt5.5 class model is 5 trillion parameters[1].

To run a 8 bit quantized version of that you need roughly 5TB of RAM.

Today that is around 18 NVidia B300. That's around $900,000, without including the computers to run them in.

It's true that the capability of open source models is improving, but running actual frontier models on your MPB seems a way off.

[1] https://x.com/elonmusk/status/2042123561666855235?s=20 (and Elon has hired enough people out of those labs to have a fair idea)

People had this "why you probably can't run a GPT-4 (or even GPT-3.5) class model on your MBP anytime soon" conversation before.

Today's LLMs are able pack much more capabilities into fewer parameters compared to 2023. We might still be at the very rudimentary phase of this technology there are low-hanging efficiency gains to be had left and right. These models consume many orders of magnitude more energy than a human brain, this all seems like room for improvement.

The right question: is there a law in information theory that fundamentally prevents a 70B model of any architecture from being as smart as Opus 4.7?

There is a huge gap between "in two years" and "theoretically possible"
>> People had this "why you probably can't run a GPT-4 (or even GPT-3.5) class model on your MBP anytime soon" conversation before.
The OP said "as capable as the frontier cloud models are today" which might assume model improvements that do more with less. Opus 4.7/Gpt5.5 performance might be achievable with a fraction of the parameters.
Exactly. I also feel like being able to choose a model for the use case could be worth an idea. So instead of trying to squeeze all kinds of knowledge into a single model, even if it's moe, just focus models on use cases. I bet you only need double digit billion parameter models for that with same or even better performance
Opus and Gpt are generic LLMs with knowledge on all sort of topics. For specific use cases you probably don't need all the parameters? Suppose you want to generate code with opencode, what part of the generic LLM is needed and what parts can be removed?
we're already doing that, it's called distillation and how models like deepseek are trained.
I think your own math leads to the conclusion the public apis are not serving models of that size. They couldn’t afford to
As far as I can tell Minimax M2.7 is better than anything available a year ago, but it runs on an ordinary PC. Will that continue? Not sure, but the trend has continued for the last two years and I don't know of any fundamental limits the models are approaching.
I wish more people were more aware of this. I think so much of the current optimism is based on "it doesn't matter if companies are raising prices since I'm just going to run the model locally", doesn't fly.
> A Opus 4.7/Gpt5.5 class model is 5 trillion parameters.

Or so they say.

If it's true then that just shows how far behind the cloud providers are lagging while wasting investor money.

(There's a huge amount of diminishing returns in increasing parameter counts and the intelligent AI company should be hard at work figuring out the optimal count without overfitting.)

Do that will only be possible with something like better 3D NAND flash memory, needs a new hardware. People are already trying to bring that the market. Contemplated taking a compiler position in such a company.
HBF is a non-starter, it runs way too hot compared to DRAM (which only pays for refresh at idle) for the same memory traffic. Only helps for extremely sparse MoE models - probably sparser than we're seeing today.
> A Opus 4.7/Gpt5.5 class model is 5 trillion parameters[1].

You could run it on a cluster of nodes that each do some mix of fetching parameters from disk and caching them in RAM. Use pipeline parallelism to minimize network bandwidth requirements given the huge size. Then time to first token may be a bit slow, but sustained inference should achieve enough throughput for a single user. That's a costly setup of course, but it doesn't cost $900k.

> You could run it on a cluster of nodes

Not sure this is a MBP either.

Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.
I did this calculation a bit ago and don't think frontier models are just a few MacBook Pro generations away. Yes numbers reliably go up in tech in general but in specific semiconductors & standards have long lead-times and published roadmaps, so we can have high confidence in what we're getting even in 3-4 years in terms of both transistor density and RAM speeds.

In mid-2028 we have N2E/N2P with around 15% greater transistor density than today's N3P, and by EOY2028 we'll likely have A14 with about 35-40% density improvement.

Meanwhile, we'll be on LPDDR6 by that point, which takes M-series Pros from 307GB/s -> ~400GB/s, and Max's from 614GB/s -> ~800GB/s.

Model improvements obviously will help out, but on the raw hardware front these aren't in the ballpark for frontier model numbers. An H100 has 3TB/s memory bandwidth, fwiw

What do you need 3 TB/s memory bandwidth for in a single user context? DeepSeek V4 pro (the latest near-SOTA model) has about 25 GB worth of active parameters (it uses a FP4 format for most layers) which gives 12 tok/s on a 307 GB/s platform as the current memory bandwidth bottleneck, maybe a bit less than that if you consider KV cache reads. That's not quite great but it's not terrible either for a pro quality model. Of course that totally ignores RAM limits which are the real issue at present: limited RAM forces you to fetch at least some fraction of params from storage, which while relatively fast is nowhere near as fast as RAM so your real tok/s are far lower (about 2 for a broadly similar model on a top-end M5 Pro laptop).
That's not "math". That's a "wild guess", or baseless extrapolation at best.
My son doubled in size in the first 8 months of his life. At age 12, he will be larger than the Moon.
One of my favorite xkcd

https://xkcd.com/605/

So long as you don't require deep search grounding like massive web indexes or document stores which are hard to reproduce locally. You can do local agentic things that get close or even do better depending on search strategy, but theoretically a massive cloud service with huge data stores at hand should be able to produce better results.

In practice unless you're doing some kind of deep research thing with the cloud, it'll try to optimize mostly for time and get you a good enough answer rather than spending an hour or two. An hour of cloud searching with huge data stores is not equivalent to an hour of local agentic searching, presumably.

I think that problem will improve a little in the coming years as we kind of create optimized data curation, but the information world will keep growing so the advantage will likely remain with centralized services as long as they offer their complete potential rather than a fraction.

Also, all the cloud models don't have to be the best frontier models, and you don't need to focus on hitting the benchmark of shrinking Opus 4.7 down to a single MBP to make significant improvements. If you get it so that an Opus 4.7 benchmark-compatible model can run in $250k of datacenter capex (and associated reduced opex for power+cooling) that'd be a massive cost improvement that makes the cloud models cheaper. And for most consumers that'll probably be good enough. You don't need to run on a $5k laptop to make a big difference.
Show your working / explain your math?
Apple is basically in the same boat as AMD and Intel. They have a weak, raster-focused GPU architecture that doesn't scale to 100B+ inference workloads and especially struggles with large context prefill. TPUs smoke them on inference, and Nvidia hardware is far-and-away more efficient for training.
This doesn't get talked about enough - the GPU is weak, weak, weak. And anyone who can fix them will go to a serious AI company (for 2-3x the salary).
The GPU is monstrously good. Depending on the workload, the M1 series GPU using 120W could beat an RTX 3090 using 420W.

Same with the CPU. Linux compiled faster on an M1 than on the fastest Intel i9 at the time, again using only 25% of the power budget.

And the M-series has only gotten better.

It is kind of sad Apple neglects helping developers optimize games for the M-series because iDevices and MacBooks could be the mobile gaming devices.

The GPUs are bottom-barrel for compute-focused industries. It is mobile-grade hardware that arguably can't even scale to prior Mac Pro workloads.

> The GPU is monstrously good. Depending on the workload, the M1 series GPU using 120W could beat an RTX 3090 using 420W.

You're just listing the TDP max of both chips. If you limit a 3090 to 120W then it would still run laps around an M1 Max in several workloads despite being an 8nm GPU versus a 5nm one.

> It is kind of sad Apple neglects helping developers optimize games for the M-series

Apple directly advocated for ports like Death Stranding, Cyberpunk 2077 and Resident Evil internally. Advocacy and optimization are not the issue, Apple's obsession over reinventing the wheel with Metal is what puts the Steam Deck ahead.

Edit (response to matthewmacleod):

> Bold of them to reinvent something that hadn't been invented yet.

Vulkan was not the first open graphics API, as most Mac developers will happily inform you.

> The GPUs are bottom-barrel for compute-focused industries. It is mobile-grade hardware that arguably can't even scale to prior Mac Pro workloads.

Surprised Apple didn't create a TPU-like architecture. Another misstep from John Gianneadrea.

> Vulkan was not the first open graphics API, as most Mac developers will happily inform you.

OpenGL had become too unmanagable which is why devs moved to DirectX.

Unless you meant a different one?

Apple's obsession over reinventing the wheel with Metal

Bold of them to reinvent something that hadn't been invented yet.

>the M1 series GPU using 120W could beat an RTX 3090 using 420W

You're cooked if you actually believe this

I very recently ran the numbers on these GPUs for an upcoming blog post. The token generation performance is bad, but the prefill performance is _really_ bad.

For a Qwen 3.6 35B / 3B MoE, 4-bit quant:

- parsing a 4k prompt on a M4 Macbook Air takes 17 seconds before generating a single token.

- on an M4 Max Mac Studio it's faster at 2.3 seconds

- on an RTX 5090, it's 142ms.

RTX 5090 uses more power than an M4 Max Mac Studio but it's not 16x more power.

Somehow Apple has always been able to sell their stuff as somehow Magic. Remember the megahertz myth? Apple hertzes and apple bytes are much better than PC hertzes and bytes because they are made by virgin elves during a full moon.
On Geekbench 5, the M1 hits 483 FPS and the RTX 3090 hits 504 FPS.

There are other workloads where the M1 actually beats the 3090.

Apple does plenty of hyping but it's always cute when irrational haters like you put them down. The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.

Apples and limes.

The context of this thread isn't consumer chips, but Apple's analog to an H/B200.

Well Apple is in the consumer computing business.
What do TPUs do to improve on GPUs at inference?
More compute
Apple is in a much better boat than AMD or Intel. They have a gigantic warchest and can just snap up whoever looks like a leader coming out of the bubble burst.
It's becoming increasingly clear that there is no moat on models. The winners will be the ones who have existing products and ecosystems they can tie AI in to. You will pay adobe for credits because that will be the only AI that works in Photoshop, you will pay microsoft because only theirs will work on your microsoft cloud apps.

Open AI has nothing. Their tech will rapidly be devalued by free models the moment they stop lighting stacks of cash on fire.

I kind of agree with you at this point. When ChatGPT was rapidly gaining popularity I thought that they will eventually replace search (esp. for shopping), which would have given them a huge ad revenue. Maybe they could have even tried social networking e.g., to help you sort out the huge flow of information that today's social networks are and get to the important/rewarding/whatever posts. But now ChatGPT is kind of getting commoditized. I would even dare say that gemini feels to me a bit better now, so the search route for ChatGPT is clearly gone.
OpenAI is handling 15% of US traffic.
Counterpoint: Apple's opportunity to invest in GPGPU architectures was ~2017 when Apple Silicon was in it's design stages. Apple always knew they were surrendering a large market segment by depreciating CUDA and OpenCL, they just never knew how big the market segment would get. Their liquid cash is wasted mid-bubble, and waiting for it to pop is not a realistic timeline anymore.

Arguably, Apple isn't even in a boat right now. At least AMD and Intel both ship hardware that synergizes with CUDA - Apple jumped off that ship, their hardware doesn't even come up in infrastructure discussions where AMD, Intel and Nvidia are taken for granted.

They also degrade their own direct services with little warning or thought put into change management, so, to be fair, Apple may be getting the same quality of service as the rest of us.
I think that's just how Google is, by nature. They don't intentionally degrade their services. They just aren't a customer centric company. They run on numbers. As a corporate, it doesn't really encourage support and maintenance work either.
Indeed. I'm wondering if Apple's "miss the train" with AI ended up being a blessing for them. Not only in the Google deal but also there's a lot of people doing interesting stuff locally..
Maybe I am missing something here, but if all the frontier AI labs use TPU, why is Nvidia making so much money?
training, multicloud, onprem, resale
> The only one that doesn't use TPU is OpenAI

For inference? This is from July 2025: OpenAI tests Google TPUs amid rising inference cost concerns, https://www.networkworld.com/article/4015386/openai-tests-go... / https://archive.vn/zhKc4

> ... due to the exclusive deal with Microsoft

This exclusivity went away in Oct 2025 (except for 'API' workloads).

  OpenAI has contracted to purchase an incremental $250B of Azure services, and Microsoft will no longer have a right of first refusal to be OpenAI’s compute provider.
https://blogs.microsoft.com/blog/2025/10/28/the-next-chapter... / https://archive.vn/1eF0V
OpenAI uses GCP. I don't know if they use TPUs.

https://www.reuters.com/business/retail-consumer/openai-taps...

I wish Google would launch Mac Mini-like devices running their consumer-grade TPUs for local inference. I get that they don't want it to eat into their GCP margins, but it would still get them into consumer desktops that Pixel Books could never penetrate (Chromebooks don't count and may likely become obsolete soon due to MacBook Neo).
> Microsoft will no longer pay a revenue share to OpenAI. > Revenue share payments from OpenAI to Microsoft continue through 2030, independent of OpenAI’s technology progress, at the same percentage but subject to a total cap.

How is this helping OpenAI?

Had written a blog post on the same a few days back, if anyone's interested in readng (hardly 5 minute read): Can Google Win the AI Hardware Race Through TPUs?

https://google-ai-race.pagey.site/

Hello, your link says "~20 min read" wich seems to be the case!
I guess I myself have read it too many times by now so in mind it was just 5 minute read when I made this comment... sorry..
Well, I guess in that case it is hardly a 5 minute read.
Why is it called frontier and why is it called a frontier ai lab?
Because they are based on [the west coast of the US](https://en.wikipedia.org/wiki/American_frontier). DeepSeek, Z.ai, Moonhsot, and Mistral are never called frontier because they aren't based in California.
Outside of California they're sparkling ai labs.
I am supposed to say we are using these sparkle models then?
So its like a prarie thing
Huh, interesting. History casts a long shadow.
It’s like “cutting edge”. A metaphor for the newest and best.
Dont forget Elon, i am sure this news will come up on the up and coming OpenAI vs Elon Musk trail starting soon! I cant wait to hear all the discovery from this trail
I heard a lot of rumors that google is cooking. And it is what will win the ai game
In the recent Dwarkesh Podcast episode Jensen Huang (Nvidia) said that virtually nobody but Anthropic uses TPUs. How does that add up?
I am not sure what context Jensen said that. But midjourney uses tpu. Apple uses tpu. They are no other frontier labs that use it, but Google + Anthropic is 2 out of 3 frontier lab so.....

You could reasonably say that "A majority of frontier labs uses TPU to train and serve their model."

Afaik, TPUs are only used for inference, not training. Maybe that was also what the quote referred to.
Mayhaps! But I think as far as google, anthropic[1] and apple[2] goes, they do use the tpus for training. Ofc v4 and v5 (older generations of tpus) were more specialized for search related embedding workloads and i could see people not using them for training.

[1]: We train and run Claude on a range of AI hardware—AWS Trainium, Google TPUs - April 6th, Anthropic on Google and Broadcom partnership [2]: "[Apple foundation model]... builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs" - Apple in 2024

> How does that add up?

He's been saying whatever is good for Nvidia for years now without any regard for truth or reason. He's one of the least trustworthy voices in the space.

Jensen hallucinates more than any llm, he just speaks without thinking all that much about what he says and he generalizes a lot. Trying to hold him accountable to imprecisions and gross simplifications is just going to frustrate whoever tries without changing one bit of his behavior.
You're asking why a businessman would downplay the use of a competing product line?
This is the same guy who said OpenClaw was the most important software release ever. Statements like this make me question how technically competent these tech CEOs are
You should instead question how honest they are.
Is technical competence the primary measure of tech CEOs at this point? Points vaguely at Elon Musk and the upcoming IPO
Who is the other frontier lab other than Anthropic, OpenAI, and Google? I thought they were ahead of everyone else.
Folks who make Deepseek, Qwen, GLM, MiniMax, Kimi and MiMo.
They're at the frontier of last year. They compete with Opus 4.5. They don't yet compete with current frontier models.

They'll presumably catch up, there is no monopoly on talent held by the US. And, that's more true than ever now that the US is actively hostile to immigrants. Scientists who might have come to the US three years ago have little reason to do so now.

Since Gemini 3.1 Pro is considered to be at frontier and GLM 5.1 does better than it in coding benchmarks it would be fair to say GLM 5.1 is a frontier model.
Nit: scientists have the same reasons to do so now, the same as ever. They just have additional reasons to not do so.

But even that distinction is only temporary, since we're determined to piss away any remaining research lead that draws people in.

Hopefully the next administration will work at actively reversing the damage, with incentives beyond just "we pinky-promise not to haul you at gunpoint to a concrete detention center and then deport you to Yemen".

> Hopefully the next administration will work at actively reversing the damage, with incentives beyond just "we pinky-promise not to haul you at gunpoint to a concrete detention center and then deport you to Yemen".

Won't be enough to undo the damage. The US would have to do a full about face, prosecute crimes of the current administration and enact serious core reforms to make it impossible for things to drastically change again in 4 years. Also known as, never going to happen because even the current opposition party doesn't actually want structural change. The world has seen how bad the US can get from a single election, and that isn't changing any time soon.

It's kind of hard to say this unless you go out of your way - the scaffolding for interacting with the raw model is a lot better now for many tasks. Is it that 4.7 is so much better than 4.5 or claude 1.119 is so much tuned to squeeze utility out of the LLM despite the hallucinations and lack of self awareness etc. Certainly the current products are great, but I think it's hard to separate the two things, the raw model and the agent workflow constraining the model towards utility.
You can use Claude Code with other models, so one could test that theory. https://openrouter.ai/docs/guides/coding-agents/claude-code-...
I am using Claude Code with GLM, MiniMax, Kimi and MiMo.
> Scientists who might have come to the US three years ago have little reason to do so now.

Been saying that about EU and China for decades now.

Yet the top European and Chinese still come to the US. Even in April 2026.

Yeah I thought all of those were generally acknowledged to be a little behind the big 3.
He forgot one other big company that uses TPUs besides Anthropic...
Maybe he's never heard of Google.
The only reason anyone uses a TPU is because they couldn't get the best GPUs.
Okay? I'm not sure where you're going with this.

Google's TPUs have obvious advantages for inference and are competitive for training.

You think the company that just gave 40B to Anthropic is the winner? Interesting.
That deal is a win-win for Google. If they develop a better coding model than Anthropic and beat them at coding, then they win. If they don’t, they still win by making a ton of money from Anthropic long term.
Well, it's a lose for Google if all the money disappears into thin air - but I agree that it's mostly upsides for them because of how (relatively) small the investment is for this much upside.
You think the company that just gave 40B to Anthropic isn’t the winner? Interesting.
Was Microsoft the winner based on their 50B investment in OpenAI?
If OpenAI had won the enterprise race, then maybe?