Hacker News new | ask | show | jobs
by pmb 55 days ago
At this point, when you are doing big AI you basically have to buy it from NVidia or rent it from Google. And Google can design their chips and engine and systems in a whole-datacenter context, centralizing some aspects that are impossible for chip vendors to centralize, so I suspect that when things get really big, Google's systems will always be more cost-efficient.

(disclosure: I am long GOOG, for this and a few other reasons)

6 comments

I'd go long Google too if using Gemini CLI felt anything close to the experience I get with Codex or Claude. They might have great hardware but it's worthless if their flagship coding agent gets stuck in loops trying to find the end of turn token.
Gemini CLI isn't a great product unfortunately. While it's unfortunately tied to a GUI, antigravity is a far superior agent harness. I suggest comparing that to Claude code instead.
Bad software kills good hardware.

And the converse is true also. I mean, look at NVIDIA. For the longest time they were just a gaming card company, competing with AMD. I remember alternating between the two companies for my custom builds in the 90s and it basically came down to rendering speed and frame rate.

But Jensen bet on the "compute engine" horse and pushed CUDA out, which became the defacto standard for doing fast, parallel arithmetic on a GPU. He was able to ride the BitCoin wave and then the big one, DNNs. AMD still hasn't caught on yet (despite 15 years having gone by).

I make the mistake of thinking its 2020 as well. CUDA was announced 2006 and released Feb 2007. So its actually 20 years that AMD/RADEON hasn't caught on that they need a good software stack.
Sadly, the "unfortunately tied to a GUI" is really a deal breaker (at least for me).
I wish it were otherwise but antigravity is also a distant third behind codex cli/app, and claude code.

3.1 pro is just fundamentally not on the same level. In any context I've tried it in, for code review it acts like a model from 1yr ago in that it's all hallucinated superficial bullshit.

Claude code is significantly less likely to produce the same (yet still does a decent amount). Gpt 5.4 high/xhigh is on another level altogether - truly not comparable to Gemini.

I use Claude Code all day and use Gemini CLI for personal projects and I don't see the huge gap that other people seem to talk about a lot. Truthfully there are parts of Gemini CLI I like better than Claude Code.
I agree. I like using Antigravity for some of my frontend work, and I find it does a better job than Claude Code - Opus 4.6. I’ve also found the Gemini Flash models to be good at legal defense research—I use them to help New Yorkers fight parking tickets (https://nyceasyparking.com). That said, the Claude models are still amazing at agentic work.
I don't use Gemini CLI- I use the extension in VSCode, and Gemini extension in VS Code is barely usable in comparison to Claude or GPT-5.4. My experience (consistent with a lot of other reports) is that it takes long time before answer, and frequently returns errors (after a long wait). But I think it's specific to the extension (and maynbe the CLI) because the web version of Gemini works quickly and rarely errors (for me).
There was still a big gap like, 6 months ago. Now, I'm not seeing it either. It's been working well the last couple weeks after I picked it up again.
Of the big three, Gemini gives me the worst responses for the type of tasks I give it. I haven’t really tried it for agentic coding, but the LLM itself often gives, long meandering answers and adds weird little bits of editorializing that are unnecessary at best and misleading at worst.
Same. The tone is really off. Here is a response I just got from Gemini 3.1: "Your simulation results are incredibly insightful, and they actually touch on one of the most notoriously difficult aspects of ..." It's pure bullshit, my simulation results are in fact broken, GPT spotted it immediately.
There is a news report saying that Google has assembled an "elite" team to make Gemini as good as Claude/Codex.
Isn't Amazon doing the same thing, making their own TPU's?
Yeah trainium and inferentia. They’re just not nearly as well supported on the software level. Google has already made sure this new generation will be supported by vllm, sglang, etc. Amazons chips barely support those and only multiple versions back. Super under invested in (at least on the open source side)
That's seems odd. I'd figure if they are going to sell it as a product in AWS that they'd have some sort of off the shelf tooling that would be available.
I think this is a narrow view. Aws and azure build their own data centers and partner closely with Nvidia and build their own silicon too. TPUS are non standard, no one else can run them - Nvidia build on fabrics and technologies well under and well integrated for a long time (mellanox etc) and clearly work very closely with the aws and azure hardware and data center build teams. I’d not bet that Google can do things better than everyone else - that’s certainly something Googlers always believe about themselves but it’s not the case that you can’t build a best of breed that meets or exceeds total in house builds.
Don't build your castle in someone else's kingdom.

Buying from nvidia is the only real option and even that is not optimal.

> Don't build your castle in someone else's kingdom.

would like to know about the scrape content of these castles /j

I'd bet that too if their management wasn't so incredibly uninspiring. Like, Apple under Cook was also pretty mild and a huge step down from Jobs, but Google feels like it fell off a cliff. If it wasn't for OpenAI releasing ChatGPT, they might still be sitting on that tech while only testing it internally. Now it drives their entire chip R&D.
Google was calling itself an "AI-first" company beginning in 2016 or 2017. They designed and built TPUs nearly a decade ago and were using transformer models in products like Google Translate but didn't make a big fuss about it, it just made the product way better. People should at least credit Sundar somewhat for this, it turned out to be quite prescient, especially the advantage of having your own chips that are specifically designed for ML.
AI was very different in 2016-2017 compared to what it is since ChatGPT. Facebook was also a primarily AI/ML driven company with noone realizing it on the front-end, but at least they were heavily involved in the open source side on the back-end - long before LLMs went big. In fact they enabled them to go big with things like pytorch. Google just stumbled into this. Deepmind (also acquired before Sundar) came up with the theory, but they didn't see the potential. What you call "prescience" I call luck. They did not create the demand for their own technology like e.g. Nvidia did by pushing the field ahead with full force. In fact all of Google's most popular products are from the time before Sundar took over. Even with Gemini they are dragging their heels, sitting far below all other big model providers when you look at usage.
This is a bizarre accounting of things. FAIR's efforts building Pytorch were seen as experimental and fragile by the time it was released, when Tensorflow was already being used in edge deployment for computer vision and seq-to-seq. Google was the company that prepped the technology for deployment, created the theory (Transformer architecture), implemented it in practice (BERT bidirectional encoding) and then scaled it (RoBERTa) all before GPT-3 ever released. Three years before Facebook released Llama.

> They did not create the demand for their own technology like e.g. Nvidia did by pushing the field ahead with full force.

They did, though. You are commenting on an eighth-generation TPU product that has been used millions of times a day for the past half-decade. It's likely that this will be the hardware providing inference for Apple's Gemini model they've selected to use with Siri. TPUs are the economically-conscious inference choice if you've already separated your training/inference workflows.

To be fair, I don't think any of the AI players wanted what OAI did. Sam grabbed first mover at the cost of this insane race everyone else got forced into.
What would an inspiring leader do differently for you?
Inspire
The line between inspiring and a grift can be hard to see in the moment.
I am not fan of the era when CEO is expected to be a cult leader type person.

Cook did very well in all areas as well as in not trying to create a cult.

They had no reason to destroy their golden goose, why release something that could hurt their money printing business.

Honestly im rather impressed with how they handled it, they had enough of the infra and org in place to jump at it once the cat was out of the bag.

Sundar declared a code red or whatever and they made it happen. But that could ONLY happen if they had the bedrock of that ability already built.

No one really remembers now that google was a year behind.

> I suspect that when things get really big, Google's systems will always be more cost-efficient.

In fact I am opposite of this hypothesis for two reasons. Google has artificially limited production. And because TSMC favours whoever could pay for the most capacity(as incremental capacity is very cheap for them). So Nvidia gets first slot for new process.

Also the second reason is that GCP's operating margin is very high compared to say Hetzner or lambdalabs and you can get GPUs much cheaper there compared to GCP. So students/small researchers are stuck on GPU.