Hacker News new | ask | show | jobs
by ezst 102 days ago
> the reality is that AI would change everything we do

Your true believer convictions don't matter here. Those AI accelerators are merely just marketing stunts. They won't help your local inference because they are not general purpose enough for that, they are too weak to be impactful, most people won't ever run local inference because it sucks and is a resource hog most can't afford, and it goes against the interests of those scammy unprofitable corporations who are selling us LLMs as AI as the silver bullet to every problem and got us there in the first place (they are already successful in that, by making computing unaffordable). There's little to no economical and functional meaning to those NPUs.

3 comments

> most people won't ever run local inference because it sucks and is a resource hog most can't afford

a) Local inference for chats sucks. Using LLMs for chatting is stupid though.

b) Local inference is cheap if you're not selling a general-purpose chatbot.

There's lots of fun stuff you can get with a local LLM that previously wasn't economically possible.

Two big ones are gaming (for example, text adventure games or complex board games like Magic the Gathering) and office automation (word processors, excel tables).

It surprises me that semantic search never gets mentioned here.

If you can use the NPU to process embeddings quickly, you get some incredible functionality — from photo search by subject to near match email search.

For consumer applications that’s what I’m most excited for. It takes something that used to require large teams, data, and bespoke models into commodity that any app can use.

> There's lots of fun stuff

Ask your friends or a small business owner if they are going to spend $1k on a new laptop because "there's lots of fun stuff".

For office automation, you'll get a lot more mileage with Claude and similar.

> Ask your friends or a small business owner if they are going to spend $1k on a new laptop because "there's lots of fun stuff".

Do people not buy gaming PCs and game consoles? Isn't that buying something because "there's lots of fun stuff?"

And while sure a business owner wouldn't be buying it for "fun stuff", if it was about being able to run the AI tools they want without the business risk of sending your most important data and intellectual property to an AI provider wouldn't some think about it?

> Local inference for chats sucks.

/r/SillyTavernAI would disagree with you.

Roleplay isn't chat, it's gaming.

Yes, gaming is (of course) a big use case for LLMs.

Many people who use ST have a "serious" nvidia card.

We are talking about NPUs here.

Are you kidding? A good ratio of ST folks run finetunes of Mistral Nemo (if it tells you anything). Anyway your core statement is simply wrong ("local chat sucks").
From their own GitHub:

> If you intend to do LLM inference on your local machine, we recommend a 3000-series NVIDIA graphics card with at least 6GB of VRAM, but actual requirements may vary depending on the model and backend you choose to use.

Also, please be respectful when discussing technical matters.

P.S. I didn't say "local chat sucks".

> we recommend a 3000-series NVIDIA graphics card with at least 6GB of VRAM

...which is not by any means a powerful GPU, and besides the AMD Ryzen AI CPUs in question have a plenty enough capacity to run local LLMs esp. MoE ones; with 3b active MoE parameters miniPC equipped with these CPUs dramatically outperform any "3000-series NVIDIA graphics card with at least 6GB of VRAM".

> please be respectful when discussing technical matters.

That is more applicable to your inappropriately righteous attitude than to mine.

> most people won't ever run local inference because it sucks and is a resource hog most can't afford

You have fallen headfirst into the "Not now, so never" fallacy. As if consumer hardware won't get more powerful, or models more economical.

> You have fallen headfirst into the "Not now, so never" fallacy.

Perhaps. Though we have empirical evidence of how much we can quantize and distillate models to the point of practical uselessness. That sets a bar for how large a local model needs to be for general-use as to compete with the could ones. We are talking in the area of 60GB for GPT-OSS/Qwen3.5, which is what enthusiasts are running on 32GB DDR5 + 24GB VRAM RTX 3090.

> As if consumer hardware won't get more powerful

Now I will let you, with that last fact in hand, plot a chart of how much it's been costing to provision that over the past 2 years and use it to prove me wrong about the affordability of local models.

Am I reading /r/antiai?
Parent comment is fair and technically accurate.

Do you have a real argument, especially a technical one, that you can contribute?

> Parent comment is fair and technically accurate.

In what way precisely? That local LLMs "suck"? Is that a technical argument? Or this statement "there's little to no economical and functional meaning to those NPUs." - is that actual factual statement or a emotionally charged verbal flatulence? and what "they won't help your local inference because they are not general purpose enough for that" even means? People succesfully run largeish MoE llms on AMD Ryzen AI miniPCs.

> Do you have a real argument, especially a technical one, that you can contribute?

What kind of argument do you want me to "contribute" wrt the ideological rant the "parent comment" had managed to produce?

Hey, OP here with their (apparently) controversial views. I stand firmly within those lines:

- I shouldn't be paying more for my next CPU because it has a NPU that I won't ever use. Give me the freedom of choice.

- Given that freedom of choice, it would seem that a majority would opt-out (as seen recently by Dell), so the morals of all that are dubious.

- NPUs may not be completely stupid as a concept, in theory, but at this point in time they are proprietary black-boxes purpose-built for marketing and micro-benchmarks. Give me something more general-purpose and open, and I will change my mind

- …but the problem is, you can only build so much general-purpose computing in bespoke processor. That's kind of its defining trait. So I won't hold my breath.

- Re: local-inference for the masses, putting aside the NPU shortcomings from above: how large do you think a LLM needs to be so it's deemed useful by your average laptop user? How would the inference story be like, in your honest opinion (in terms of downloading the model, loading it in memory, roundtrip times)? And how often would the user realistically want to suffer through all that, versus, just hopping to ${favorlite_llm.ai} from their browser?

Anyhow, if that makes me "antiai", please, sign me up!

> I shouldn't be paying more for my next CPU because it has a NPU that I won't ever use. Give me the freedom of choice.

There is a plenty to choose from.

> - NPUs may not be completely stupid as a concept, in theory, but at this point in time they are proprietary black-boxes purpose-built for marketing and micro-benchmarks. Give me something more general-purpose and open, and I will change my mind

In fact the linked article is not talking about NPUs in particular, but about Ryzen AI cpus. These have unified memory and more compute compared to normal ones which make them very useful for inference.

> how large do you think a LLM needs to be so it's deemed useful by your average laptop user?

Depend what they need it for. Useful autcomplete in IDE starts at around 4b weights.

> loading it in memory

Happens only once, usually takes around 10sec.

> roundtrip times

Negligible? it is loca after all.

> And how often would the user realistically want to suffer

No suffering involved.