| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ron0c 535 days ago
	This is the AI I am excited for. Data and execution local to my machine. I think Intel is betting on this with the copilot included processors. I hope ollama or other local AI services will be able to utilize these co-processors soon.

2 comments

ekianjo 535 days ago

The NPUs on laptops don't have access to enough memory to run very large models.

link

talldayo 535 days ago

Oftentimes they do. If they don't, it's not very hard to page memory to and from the NPU until the operation is completed.

The bigger problem is that this NPU hardware isn't built around scaling to larger models. It's laser-focused on dense computation and low-precision inference, which usually isn't much more efficient than running the same matmul as a compute shader. For Whisper-scale models that don't require insanely high precision or super sparse decoding, NPU hardware can work great. For LLMs it is almost always going to be slower than a well-tuned GPU.

link

650REDHAIR 535 days ago

Right, but for most people do they need access to a huge model locally?

link

e12e 535 days ago

AFAIU NPUs are for things like voice input/output, computer vision/hand gesture io, knowing how many people/who are in front of the camera etc. Always on, real-time "ai peripherals" - not content generation.

I believe Microsoft calls them "SLMs - Small Language Models".

link

ben_w 535 days ago

Most people shouldn't host locally at all.

Of those who do, I can see students and researchers benefiting from small models. Students in particular are famously short on money for fancy hardware.

My experience trying one of the Phi models (I think 3, might have been 2) was brief, because it failed so hard: my first test was to ask for a single page web app Tetris clone, and not only was the first half the output simply doing that task wrong, the second half was a sudden sharp turn into python code to train an ML model — it didn't even delimit the transition, one line javascript, the next python.

link

diggan 535 days ago

> My experience trying one of the Phi models (I think 3, might have been 2) was brief

The Phi models are tiny LMs, maybe SLM is more fitting label than LLM (Large -> Small). As such, you cannot throw even semi-complicated problems at them. Things like "autocomplete" and other simpler things are the use cases you'd use it for, not "code this game for me", you'll need something much more powerful for that.

link

ben_w 535 days ago

> Things like "autocomplete" and other simpler things are the use cases you'd use it for, not "code this game for me", you'll need something much more powerful for that.

Indeed, clearly.

However, it was tuned for chat, and people kept telling me it was competitive with the OpenAI models for coding.

link

ron0c 526 days ago

Asking a leading LLM to "code a game" is a tall order. I have found a lot of success with self hosted small models to accomplish coding that would have taking me months without. I just break down the "code me a game" to its parts.

Think of it like an extended auto complete.

link

miohtama 535 days ago

Maybe a better solution is privately hosted cloud solution, or just any SaaS that cannot violate data privacy by design.

link

sofixa 535 days ago

> any SaaS that cannot violate data privacy by design

And that is hosted in a jurisdiction that forces them to take it seriously, e.g. Mistral in France that has to comply with GDPR and any AI and privacy regulations out of the EU.

link