Hacker News new | ask | show | jobs
by mupuff1234 551 days ago
Majority of people want better performance, running locally is just a nice to have feature.
2 comments

They’ll care though when they have to pay for it, or when they’re in an area with poor reception.
They pay to run it locally as well (more expensive hardware)

And sure, poor reception will be an issue, but most people would still absolutely take a helpful remote assistant over a dumb local assistant.

And you don't exactly see people complaining that they can't run Google/YouTube/etc locally.

Your first sentence has the fallacy that you’re attributing the cost of the device to a single feature against the cost of that single feature.

Most people are unlikely to buy the device for the AI features alone. It’s a value add to the device they’d buy anyway.

So you need the paid for option to be significantly better than the free one that comes with the device.

Your second sentence assumes the local one is dumb. What happens when local ones get better? Again how much better is the cloud one to compete on cost?

To your last sentence, it assumes data fetching from the cloud. Which is valid but a lot of data is local too. Are people really going to pay for what Google search is giving them for free?

I think it's a more likely assumption that on device performance will trail off device models by a significant margin for at least the next few years - of course if magically you can make it work locally with the same level of performance it would be better.

Plus a lot of the "agentic" stuff is interaction with the outside world, connectivity is a must regardless.

My point is that you do NOT need the same level of performance. You need an adequate level of performance that the cost to get more performance isn’t worth it to most people.
And my point is that it's way too early to try to optimize for running locally, if performance really stabilizes and comes to a halt (which may likely happen) then it makes more sense to optimize.

Plus once you start with on device features you start limiting your development speed and flexibility.

It isn't really hypothetical. Lots of good models run well on a modern Macbook Pro.
Poor reception is rapidly becoming a non-issue for most of the developed world. I can’t think of the last time I had poor reception (in America) and wasn’t on an airplane.

As the global human population increasingly urbanizes, it’ll become increasingly easy to blanket it with cell towers. Poor(er) regions of the world will increase reception more slowly, but they’re also more likely to have devices that don’t support on-device models.

Also, Gemini Flash is basically positioned as a free model, (nearly) free API, free in GUI, free in Search Results, Free in a variety of Google products, etc. No one will be paying for it.

Many major cities have significant dead spots for coverage. It’s not just for developing areas.

Flash is free for api use at a low rate limit. Gemini as a whole is not free to Android users (free right now with subscription costs beyond a time period for advanced features) and isn’t free to Google without some monetary incentive. Hence why I also originally ask about private cloud compute alternatives with Google.

I ride a ferry from a city of 50k to a city of 700k in the US and work in a building with apartments upstairs basically a concrete cave.

I see poor reception in both areas and only one has WiFi.

You can run model >100x faster in cloud compared to on device with DDR RAM. This would make up for the reception.
And you can’t run the cloud model at all if you can’t talk to the cloud.
Yes, but I can't imagine situations where I "have" to run a model when I don't have internet at that time. My life would be more affected with the rest of the internet than having to run a small stupid model locally. At the very least until the hallucination is completely solved, as I need internet to verify the models.
You’re assuming the model is purely for generation though. Several of the Gemini features are lookup of things across data available to it. A lot of that data can be local to device.

That is currently Apple’s path with Apple Intelligence for example.

Hallucination can't be solved because bogus output is categorically the same sort of thing as useful output.

It has no world model. It doesn't know truth any more than it knows bullshit just a statistical relationship between words.

Latency is a huge factor in performance, and local models often have a huge edge. Especially on mobile devices that could be offline entirely.
Definitely not when it comes to LLM's, the larger more useful local models are not that fast and latency is not an issue, just look at this Google models voice function or even openai's advanced voice.