Hacker News new | ask | show | jobs
by dagmx 551 days ago
Regarding TPU’s, sure for the stuff that’s running on the cloud.

However their on device TPUs lag behind the competition and Google still seem to struggle to move significant parts of Gemini to run on device as a result.

Of course, Gemini is provided as a subscription service as well so perhaps they’re not incentivized to move things locally.

I am curious if they’ll introduce something like Apple’s private cloud compute.

3 comments

i don’t think they need to win the on device market.

we need to separate inference and training - the real winners are those who have the training compute. you can always have other companies help with inference

> i don’t think they need to win the on device market.

The second Apple comes out with strong on-device AI - and it very much looks like they will - Google will have to respond on Android. They can't just sit and pray that e.g. Samsung makes a competitive chip for this purpose.

I think Apple is uniquely disadvantaged in the AI race to a point people dont realize. They have less training data to use, having famously been focused on privacy for its users and thus having no particular advantage in this space due to not having customer data to train on. They have little to no cloud business, and while they operate a couple of services for their users, they do not have the infrastructure scale to compete with hyperscaler cloud vendors such as Google and Microsoft. Most of what they would need to spend on training new models would require that they hand over lots of money to the very companies that already have their own models, supercharging their competition.

While there is a chance that Apple might come out with a very sophisticate on-device model. The problem here is that they would only be able to compete with other on-device models. The magnitude of compute needed to keep pace with SOA models is not achievable on a single device. It will take many generations of Apple silicon in order to compete with the compute of existing datacenters.

Google also already has competitive silicon in this space with the Tensor series processors, which are being fabbed at Samsung plants today. There is no sitting and praying necessary on their part as they already compete.

Apple is a very distant competitor in the space of AI, and I see no reason to assume this will change, they are uniquely disadvantaged by several of the choices they made on their way to mobile supremacy. The only thing they currently have going for them is the development of their own ARM silicon which may give them the ability to compete with Google's TPU chips, but there is far more needed to be competitive here than the ability to avoid the Nvidia tax.

There’s an easy solution here: Apple isn’t trying to compete with the big models everyone else is running. They’re betting in the opposite direction that many small models is a better value ad for their customers. And they can call out to other services as needed for the larger stuff.

I’m in the camp that this is the right call for consumers, instead of trying to compete on the large model side. They’ve yet to deliver on their full promise, but if they can, it’s the place where I think more of the industry will go (for consumers)

And regarding Google’s mobile tensor chips, they are infamously behind all other players in the market space for the same generation of processor. They don’t share the same advantages they do in the server space.

training bigger models gets you small models for free plus a higher upper bound in capabilities.

Apple just isn’t very capable in this space, not sure what’s so hard to accept

Apple have trained their own foundation LLM.
hardly even qualifies for ‘fast follow’, more like ‘surprisingly slow follow’

their models aren’t even that good. sorry apple fanboys but the talent isn’t there

"having famously been focused on privacy for its users and thus having no particular advantage in this space due to not having customer data to train on"

That may not be as big a disadvantage as you think.

Anthropic claim that they did not use any data from their users when they trained Claude 3.5 Sonnet.

sure but they certainly acquired data from mass scraping (including of data produced by their users) and/or data brokering aka paying someone to do the same.
It is likely Apple can get additional data by creating synthetic data for user interactions.

About 7 years ago I trained GAN models to generate synthetic data, and it worked so well. The state of the art has increased a lot in 7 years, so Apple will be fine.

For a while there I would have been in agreeance with you, but the thought that models can be trained purely on synthetic data has shown to be wrong on multiple levels. Synthetic data needs to be reviewed by individuals to ensure data quality, significantly reducing the speed at which an organization can adopt training data. Reasonable engineers would suggest that the answer to this is to have other language models review the synthetic data, but we have seen that this is what leads to model collapse due to compounding issues around hallucinations.

At best Synthetic data is a "slow follow" for training a model due to the need for human review, but a competitive model, it does not make.

yeah i’ve never understood the outsized optimism for apple’s ai strategy, especially on hn.

they’re a little bit less of a nobody than they used to be, but they’re basically a nobody when it comes to frontier research/scaling. and the best model matters way more than on-device which can always just be distilled later and find some random startup/chipco to do inference

Theory: Apple's lifestyle branding is quite important to the identity of many in the community here. I mean, look at the buy-in at launch for Apple Vision Pro by so many people on HN--it made actual Apple communities and publications look like jaded skeptics.
Oh please, this is the classic “everyone who chooses differently than myself is <superficial/dumb/misinformed>” argument that a lot of people use when it comes to tech nerd identity politics.

Is it really that hard to imagine people have different viewpoints, and decisions than yourself without being painted as vapid, airheads?

For clarity, I was only talking about the hardware side, not the software one. I don't think the models matter too much, by the time the hardware is ready there will be open models that Apple can take and modify to their liking.

Besides, did Anthropic and e.g. Mistral inherently have such troves of data to train on that Apple doesn't? For the last 6 months, Anthropic has had the SOTA model for the average production usecase.

> Google also already has competitive silicon in this space with the Tensor series processors, which are being fabbed at Samsung plants today. There is no sitting and praying necessary on their part as they already compete.

Intel had a much bigger advantage with x86, and look where we are now. I find it hard to believe that creating a good AI chip isn't a much smaller challenge than it was to do Apple Silicon. The upcoming SE uses their in-house 5G modem, another huge hardware achievement that no one else has been able to do.

With that in mind, how can you bet against Apple when it comes to designing chips at this point? It's not like Amazon et al aren't producing their own AI chips too. Let alone all of the startups like Cerebras. That indicates the moat and barriers are likely much lower than Apple Slicion or the 5G modem.

If I'm talking nonsense, do correct me.

The Android on chip AI is and has been leagues better than what is available on iOS.

If anything, I think the upcoming iOS AI update will bring them to a similar level as android/google.

But given inference time compute, to give a strong reply reasonably fast, you'll need a lot of compute, very rarely used.

Economically this fits the cloud much better.

At what point does the on device stuff eat into their market share though? As on device gets better, who will pay for cloud compute? Other than enterprise use.

I’m not saying on device will ever truly compete at quality, but I believe it’ll be good enough that most people don’t care to pay for cloud services.

You're still focused about inference :)

inference basically does not matter, it is a commodity

You’re still focused about training :)

training doesn’t matter if inference costs are high and people don’t pay for them

but inference costs arent high already and there are tons of hardware companies that can do relatively cheap LLM inference
Inference costs per invocation aren’t high. Scale it out to billions of users and it’s a different story.

Training is amortized over each inference, so the cost of inference also needs to include the cost of training to break even unless made up elsewhere

That makes no sense. Inference cost dwarf training cost if you have a succesfull product pretty quickly. Afaik there is no commodity hardware that can run state of the art models like chatgpt-o1.
> Afaik there is no commodity hardware that can run state of the art models like chatgpt-o1.

Stack enough GPUs and any of them can run o1. Building a chip to infer LLMs is much easier than building a training chip.

Just because one cost dwarfs another does not mean that this is where the most marginal value from developing a better chip will be, especially if other people are just doing it for you. Google gets a good model, inference providers will be begging to be able to run it on their platform, or to just sell google their chips - and as I said, inference chips are much easier.

Chip level is only a tiny part of the story. Training can happen with a big boy variant of "it works on my machine". Inference require a world wide network of GPUs. Chip level is the last thing you will be worrying about.
Each GPU costs ~50k. You need at least 8 of them to run mid-sized models. Then you need a server to plug those GPUs into. That's not commodity hardware.
I don’t think the AI market will ever really be a healthy one until inference vastly outnumbers training. What does it say about AI if training is done more than inference?

I agree that the in-device inference market is not important yet.

done more != where the value is at

inference hardware is a commodity in a way that training is not

Majority of people want better performance, running locally is just a nice to have feature.
They’ll care though when they have to pay for it, or when they’re in an area with poor reception.
They pay to run it locally as well (more expensive hardware)

And sure, poor reception will be an issue, but most people would still absolutely take a helpful remote assistant over a dumb local assistant.

And you don't exactly see people complaining that they can't run Google/YouTube/etc locally.

Your first sentence has the fallacy that you’re attributing the cost of the device to a single feature against the cost of that single feature.

Most people are unlikely to buy the device for the AI features alone. It’s a value add to the device they’d buy anyway.

So you need the paid for option to be significantly better than the free one that comes with the device.

Your second sentence assumes the local one is dumb. What happens when local ones get better? Again how much better is the cloud one to compete on cost?

To your last sentence, it assumes data fetching from the cloud. Which is valid but a lot of data is local too. Are people really going to pay for what Google search is giving them for free?

I think it's a more likely assumption that on device performance will trail off device models by a significant margin for at least the next few years - of course if magically you can make it work locally with the same level of performance it would be better.

Plus a lot of the "agentic" stuff is interaction with the outside world, connectivity is a must regardless.

My point is that you do NOT need the same level of performance. You need an adequate level of performance that the cost to get more performance isn’t worth it to most people.
It isn't really hypothetical. Lots of good models run well on a modern Macbook Pro.
Poor reception is rapidly becoming a non-issue for most of the developed world. I can’t think of the last time I had poor reception (in America) and wasn’t on an airplane.

As the global human population increasingly urbanizes, it’ll become increasingly easy to blanket it with cell towers. Poor(er) regions of the world will increase reception more slowly, but they’re also more likely to have devices that don’t support on-device models.

Also, Gemini Flash is basically positioned as a free model, (nearly) free API, free in GUI, free in Search Results, Free in a variety of Google products, etc. No one will be paying for it.

Many major cities have significant dead spots for coverage. It’s not just for developing areas.

Flash is free for api use at a low rate limit. Gemini as a whole is not free to Android users (free right now with subscription costs beyond a time period for advanced features) and isn’t free to Google without some monetary incentive. Hence why I also originally ask about private cloud compute alternatives with Google.

I ride a ferry from a city of 50k to a city of 700k in the US and work in a building with apartments upstairs basically a concrete cave.

I see poor reception in both areas and only one has WiFi.

You can run model >100x faster in cloud compared to on device with DDR RAM. This would make up for the reception.
And you can’t run the cloud model at all if you can’t talk to the cloud.
Yes, but I can't imagine situations where I "have" to run a model when I don't have internet at that time. My life would be more affected with the rest of the internet than having to run a small stupid model locally. At the very least until the hallucination is completely solved, as I need internet to verify the models.
You’re assuming the model is purely for generation though. Several of the Gemini features are lookup of things across data available to it. A lot of that data can be local to device.

That is currently Apple’s path with Apple Intelligence for example.

Hallucination can't be solved because bogus output is categorically the same sort of thing as useful output.

It has no world model. It doesn't know truth any more than it knows bullshit just a statistical relationship between words.

Latency is a huge factor in performance, and local models often have a huge edge. Especially on mobile devices that could be offline entirely.
Definitely not when it comes to LLM's, the larger more useful local models are not that fast and latency is not an issue, just look at this Google models voice function or even openai's advanced voice.
If the model weights is not open, you can't run it on device anyways.
The Pixel 9 runs many small proprietary Gemini models on the internal TPU.
And yet these new models still haven’t reached feature parity with Google Assistant, which can turn my flashlight on, but with all the power of burning down a rainforest, Gemini still cannot interact with my actual phone.
I just tried asking my phone to turn on the flashlight using Gemini. It worked. https://9to5google.com/2024/11/07/gemini-utilities-extension...
Ok I tried literally last week on Pixel 7a and it didn’t work. What model do you have? Maybe it requires a phone that can do on-device models?
Works on a Pixel 4A 5G..

Pretty sure that's not doing any fancy on-device models!

That said, there was a popup today saying that assistant is now using Gemini, so I just enabled it to try. Could well have changed in the last week.

I just tried it on my Galaxy Ultra s23 and it worked. I then disconnected internet and it did not work.
Gemini nano weights are leaked and google doesn't care about it being leaked. Google would definitely care if Pro weights are leaked.
Is there any phone in the world that can realistically run pro weights?