Hacker News new | ask | show | jobs
by SwellJoe 13 days ago
Google keeps flexin'. It's surprising that Gemini isn't more competitive against Claude or OpenAI models for code and agentic use, because it's clear Google still has some of the best AI people in the business. But, I guess Google is focused on stuff that runs on phones and near-realtime use cases, rather than the big thinky LLMs.

All these efficiency improvements seem likely to be really important to the future of AI, though, as the money starts flowing the other direction. The days of subsidized tokens to try to lock people into specific ecosystems are coming to an end, and we're going to have to start paying what it actually costs.

The companies that figure out how to make it cost-effective to run really smart models are the ones that will win. DeepSeek costs an order of magnitude less than GPT 5.5 or Opus 4.8. It's worse than either, but not catastrophically worse. I'll happily pay ten times as much for the best coding model, because it saves enough human time to justify it, but not a hundred times as much, which is where things seem to be heading (GPT 5.5 Pro cost over 200 times as much as DeepSeek in some benchmarks I recently did, and ~30 times as much as Opus 4.8).

3 comments

Google is clearly gimping the gemma models. There is a 122b gemma 4 that was never released, but was a part of the announcement tweet. Plus they weren't going to release MTP until people figured out they're running it on the pixels
I dunno about that. Gemma 4 is probably the best model for general self-hosted use for almost everyone that doesn't have a data center in their basement. They didn't have to release it at all, and they didn't have to release speculative decoding drafters, and they didn't have to release the QAT version of the models that makes the 4-bit quantization perform very close to the bigger versions, and can run in 32GB. I'd love a 122B version of it, and I didn't realize they'd ever announced one was coming (though I remember there being speculation about it). But, also, I'm happy they're doing so much with it. They've got all the sizes covered, it has great prose for an LLM, better prose than even most larger models, it's got great audio and vision, and broad language support. As self-hosted general purpose models go, it's the total package.

Qwen 3.6 is maybe better for code (though I'm beginning to think otherwise after some benchmarking I've been doing, where Gemma 4 has been overperforming expectations), but for just about anything else, Gemma 4 is the one.

If they're gimping it, why is nobody else making a better one that small?

Fable's costs are twice Opus' and it's clearly quite competitive with GPT-Pro, so that seems like it might be a good option for you if the trigger-happy safeguards aren't too much of a problem. Google has their own "Deep Research" option in this space which seems to work well.

The nice thing about DeepSeek is its ability to be run on local hardware, with no API costs involved. If you care deeply about that, then it being a bit worse than Opus or GPT isn't really a problem.

I think Google will win out in the end. They are concentrating on what matters, performance per watt, and performance per dollar. They are building their own inference hardware and are working towards edge-computing which removes latency and compute overheads. These big LLMs are not yet cost effective, Google is just letting them burn their investment funds to "sell" to consumers at below cost.

After the AI bubble bursts, it will be the likes of Google that come out the other side still wearing their shirts. I think this bubble is out to scalp some giants.