| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by otabdeveloper4 100 days ago

> most people won't ever run local inference because it sucks and is a resource hog most can't afford

a) Local inference for chats sucks. Using LLMs for chatting is stupid though.

b) Local inference is cheap if you're not selling a general-purpose chatbot.

There's lots of fun stuff you can get with a local LLM that previously wasn't economically possible.

Two big ones are gaming (for example, text adventure games or complex board games like Magic the Gathering) and office automation (word processors, excel tables).

3 comments

data-ottawa 100 days ago

It surprises me that semantic search never gets mentioned here.

If you can use the NPU to process embeddings quickly, you get some incredible functionality — from photo search by subject to near match email search.

For consumer applications that’s what I’m most excited for. It takes something that used to require large teams, data, and bespoke models into commodity that any app can use.

link

g947o 100 days ago

> There's lots of fun stuff

Ask your friends or a small business owner if they are going to spend $1k on a new laptop because "there's lots of fun stuff".

For office automation, you'll get a lot more mileage with Claude and similar.

link

vel0city 100 days ago

> Ask your friends or a small business owner if they are going to spend $1k on a new laptop because "there's lots of fun stuff".

Do people not buy gaming PCs and game consoles? Isn't that buying something because "there's lots of fun stuff?"

And while sure a business owner wouldn't be buying it for "fun stuff", if it was about being able to run the AI tools they want without the business risk of sending your most important data and intellectual property to an AI provider wouldn't some think about it?

link

BoredomIsFun 100 days ago

> Local inference for chats sucks.

/r/SillyTavernAI would disagree with you.

link

otabdeveloper4 98 days ago

Roleplay isn't chat, it's gaming.

Yes, gaming is (of course) a big use case for LLMs.

link

g947o 100 days ago

Many people who use ST have a "serious" nvidia card.

We are talking about NPUs here.

link

BoredomIsFun 100 days ago

Are you kidding? A good ratio of ST folks run finetunes of Mistral Nemo (if it tells you anything). Anyway your core statement is simply wrong ("local chat sucks").

link

g947o 99 days ago

From their own GitHub:

> If you intend to do LLM inference on your local machine, we recommend a 3000-series NVIDIA graphics card with at least 6GB of VRAM, but actual requirements may vary depending on the model and backend you choose to use.

Also, please be respectful when discussing technical matters.

P.S. I didn't say "local chat sucks".

link

BoredomIsFun 99 days ago

> we recommend a 3000-series NVIDIA graphics card with at least 6GB of VRAM

...which is not by any means a powerful GPU, and besides the AMD Ryzen AI CPUs in question have a plenty enough capacity to run local LLMs esp. MoE ones; with 3b active MoE parameters miniPC equipped with these CPUs dramatically outperform any "3000-series NVIDIA graphics card with at least 6GB of VRAM".

> please be respectful when discussing technical matters.

That is more applicable to your inappropriately righteous attitude than to mine.

link