Hacker News new | ask | show | jobs
by Terretta 742 days ago
> Doing everything on-device would result in a horrible user experience. They might as well not participate in this generative AI rush at all if they hoped to keep it on-device.

On the contrary, I'm shocked over the last few months how "on device" on a Macbook Pro or Mac Studio competes plausibly with last year's early GPT-4, leveraging Llama 3 70b or Qwen2 72b.

There are surprisingly few things you "need" 128GB of so-called "unified RAM" for, but with M-series processors and the memory bandwidth, this is a use case that shines.

From this thread covering performance of llama.cpp on Apple Silicon M-series …

https://github.com/ggerganov/llama.cpp/discussions/4167

"Buy as much memory as you can afford would be my bottom line!"

1 comments

Yes - but people don't want to pay $4k for a phone with 128GB of unified memory, do they?

And whilst the LLM's running locally are cool, they're still pretty damn slow compared to Chat-GPT, or Meta's LLM.

Depending on what you want to do though.

If I want some help coding or ideas about playlists, Gemini and ChatGPT are fine.

But when I'm writing a novel about an assassin with an AI assistant and the public model keeps admonishing me that killing people is bad and he should seek help for his tendencies, it's a LOT faster to just use an uncensored local LLM.

Or when I want to create some people faces for my RPG campaign and the online generator keeps telling me my 200+ word prompt is VERBOTEN. And finally I figure out that "nude lipstick" is somehow bad.

Again, it's faster to feed all this to a local model and just get it done overnight than fight against puritanised AIs.

To say nothing of battery life.