| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by saberience 148 days ago

What’s the main use-case for this?

I get that I can run local models, but all the paid for (remote) models are superior.

So is the use-case just for people who don’t want to use big tech’s models? Is this just for privacy conscious people? Or is this just for “adult” chats, ie porn bots?

Not being cynical here, just wanting to understand the genuine reasons people are using it.

14 comments

biddit 148 days ago

Yes, frontier models from the labs are a step ahead and likely will always be, but we've already crossed levels of "good enough for X" with local models. This is analogous to the fact that my iPhone 17 is technically superior to my iPhone 8, but my outcomes for text messaging are no better.

I've invested heavily in local inference. For me, it's a mixture privacy, control, stability, cognitive security.

Privacy - my agents can work on tax docs, personal letters, etc.

Control - I do inference steering with some projects: constraining which token can be generated next at any point in time. Not possible with API endpoints.

Stability - I had many bad experiences with frontier labs' inference quality shifting within the same day, likely due to quantization due to system load. Worse, they retire models, update their own system prompts, etc. They're not stable.

Cognitive Security - This has become more important as I rely more on my agents for performing administrative work. This is intermixed with the Control/Stability concerns, but the focus is on whether I can trust it to do what I intended it to do, and that it's acting on my instructions, rather than the labs'.

metalliqaz 148 days ago

I just "invested heavily" (relatively modest, but heavy for me) in a PC for local inference. The RAM was painful. Anyway, for my focused programming tasks the 30B models are plenty good enough.

samarthr1 148 days ago

I am extremely fortunate having bought 64GB of CL30 DDR5 Ram for ~200 USD just 4 months ago!

My computer is now worth more than when I bought it

metalliqaz 143 days ago

ugh. $650 for 64GB DDR5-6000, CL36

dragonwriter 148 days ago

> What’s the main use-case for this?

Running weights available models.

> I get that I can run local models, but all the paid for (remote) models are superior.

If that's clearly true for your use cases, then maybe this isn’t for you.

> So is the use-case just for people who don’t want to use big tech’s models?

Most weights available models are also “big tech’s”, or finetunes of them.

> Is this just for privacy conscious people? Or is this just for “adult” chats, ie porn bots?

Sure, those are among the use cases. And there can be very good reasons to be concerned about privacy in some applications. But they aren’t the only reasons.

There’s a diversity of weights-available models available, with a variety of specialized strengths. Sure, for general use, the big commercial models may generally be more capable, but they may not be optimal for all uses (especially when cost effectiveness is considered, given that capable weights-available models for some uses are very lightweight.)

PeterStuer 148 days ago

For some projects, you do not want your code or documents leaving the LAN. Many companies have explicit constraints on using external SaaS. It does not mean they restrict to everything 'on prem'. 'Self hosted' can include running an open weights model on multiple rented B200's.

So yes, the tradeoff is security vs capability. The former always comes at a cost.

maxkfranz 148 days ago

Yeah, it’s not going to compare to Codex-5.2 or Opus 4.5.

Some non-programming use cases are interesting though, e.g. text to speech or speech to text.

Run a TTS model overnight on a book, and in the morning you’ll get an audiobook. With a simple approach, you’d get something more like the old books on tape (e.g. no chapter skipping), but regardless, it’s a valid use case.

JayDustheadz 147 days ago

Which TTS would you suggest? Anything out there that is able to properly see/handle modulation, punctuation and overall sentence 'mood'? I've been looking for something easy to set up but most is either extremely complex or is producing output of relatively poor quality.

maxkfranz 144 days ago

I’m still experimenting with them. I suspect you may have to do only one paragraph at a time and concatenate them together. Let me know if you’d be interested in collaborating, as I’m interested in this use case too.

marak830 148 days ago

I run a separate memory layer between my local and my chat.

Without a ton of hassle I cannot do that with a public model(without paying API pricing).

My responses may be slower, but I know the historical context is going to be there. As well as the model overrides.

In addition I can bolt on modules as I feel like it(voice, avatar, silly tavern to list a few).

I get to control my model by selecting specific ones for tasks, I can upgrade as they are released.

These are the reasons I use local.

I do use Claude for a coding junior so I can assign tasks and review it, purely because I do not have something that can replicate that locally on my setup(hardware wise, but from what I have read local coding models are not matching Claude yet)

That's more than likely a temporary issue(years not weeks with the expensive of things and state of open models specialising in coding).

konart 148 days ago

For many tasks you don't really need big models. And relatively small model, quantized too can be run on your macbook (not to mention Mac studio).

tiderpenger 148 days ago

To justify investing a trillion dollars like everything else LLM-related. The local models are pretty good. Like I ran a test on R1 (the smallest version) vs Perplexity Pro and shockingly got better answers running on base spec Mac Mini M4. It's simply not true that there is a huge difference. Mostly it's hardcoded overoptimalization. In general these models aren't really becoming better.

mk89 148 days ago

I agree with this comment here.

For me the main BIG deal is that cloud models have online search embedded etc, while this one doesn't.

However, if you don't need that (e.g., translate, summarize text, writing code) probably is good enough.

prophesi 148 days ago

So long as the local model supports tool-use, I haven't had issues with them using web search etc in open-webui. Frontier models will just be smarter in knowing when to use tools.

mk89 148 days ago

Ok I need to explore this, I didn't do it yet. Thanks.

dragonwriter 148 days ago

> For me the main BIG deal is that cloud models have online search embedded etc, while this one doesn't.

Models do not have online search embedded, they have tool use capabilities (possibly with specialized training for a web search tool), but that's true of many open and weights-available models, and they are run with harnesses that support tools and provide a web search tool (lmstudio is such a harness, and can easily be supplied with a web search tool.)

nunodonato 148 days ago

you can do web searches in lm studio. just connect an mcp that does it. Serpapi has an mcp, for example

mark_l_watson 148 days ago

Also, I had several experiments where I was interested in just 5 to 10 websites with application specific information so it works nicely for fast dev to spider, keep a local index, then get very low search latency. Obviously this is not a general solution but is nice for some use cases.

PlatoIsADisease 148 days ago

I originally used local models as a somewhat therapeutic/advice thing. I didn't want to give openAI all my dirt.

But then I decided I'm just a chemical reaction and a product of my environment, so I gave chatGPT all my dirt anyway.

But before, I cared about my privacy.

anon373839 148 days ago

> But then I decided I'm just a chemical reaction

That doesn’t address the practical significance of privacy, though. The real risk isn’t that OpenAI employees will read your chats for personal amusement. The risk is that OpenAI will exploit the secrets you’ve entrusted to them, to manipulate you, or to enable others to manipulate you.

The more information an unscrupulous actor has about you, the more damage they can do.

numpad0 148 days ago

Reports of people getting hit by twitchy fingered banbots on cloud LLMs are starting to show up(Gemini bans apparently kill Gmail and GDrive too). Paranoid types like I am appreciate local options that won't get me banned.

reactordev 148 days ago

Not always. Besides, this allows one to use a post-trained model, a heretic model, an abliterated model, or their own.

I exclusively run local models. On par with Opus 4.5 for most things. gpt-oss is pretty capable. Qwen3 as well.

nubg 148 days ago

> On par with Opus 4.5 for most things

?

Are you asking it for capital cities or what?

reactordev 148 days ago

No…

I’m asking it to write C code

gostsamo 148 days ago

currently working on a personal project where part of the pipeline is recognizing lots of images. the employer let me use gemini for personal use, but wasting large amount of tokens on gemini3 pro ocr limited my work. flash gives worse result, but there are ways to retry. good for development, but long term, simpler parts of a pipeline could be dedicated to a local model. I can imagine many other use cases where you want large volume of low difficulty tasks at close to zero cost.

hickelpickle 148 days ago

I've gotten interested in local models recently after trying the here and there for years. We've finally hit the point where small <24GB models are capable of pretty amazing things. One use I have is I have a scraped forum database, and with a 20gb devstral model I was able to get it to select a bunch of random posts related to a species of exotic plants in batches of 5-10 up to n, summarize them into and intern sqllite table, then at the end go through read the interim summarization and write a final document addressing 5 different topics related to users experience growing the species.

Thats what convinced me they are ready to do real work, are they going to replace claude code...not currently. But it is insane to me that such a small model can follow those explicit directions and consistently perform that workflow.

I've during that experimentation, even when not putting the sql explicit it was able to craft the queries on its own from just text description, and has no issue navigating the cli and file system doing basic day to day things.

I'm sure there are a lot of people doing "adult" things, but my interest is sparked because they finally at the level they can be a tool in a homelab, and no longer is llm usage limits subsidized like they used to be. Not to mention I am really disillusioned with big tech having my data or exposing a tool making API calls to them that then can make actions on my system.

I'll still keep using claude code day to day coding. But for small system based tasks I plan on moving to local llms. Their capabilities have inspired me to write my own agentic framework to see what work flows can be put together for just management and automation of day to day task. Ideally it would be nice to just chat with an llm and tell it to add an appointment or call at x time or make sure I do it that day and it can read my schedule and remind-me at a chill time of my day to make the call, and then check up that I followed through. I also plan on seeing if I can also set it up to remind me and help to practice mindfulness and just general stress management I should do. While sure a simple reminder might work, but as someone with adhd who easily forgets reminders as soon as they pop up if I can get to them now, being pestered by an agent that wakes up and engages with me seems like it might be an interesting workflow.

And the hacker aspect, now that they are capable I really want to mess around with persistent knowledge in databases and making them intercommunicate and work together. Might even give them access to rewrite themselves and access the application during run time with a lisp. But to me local llms have gotten to the point they are fun and not annoying. I can run a model that is better than chatgpt 3.5 for the most part, its knowledge is more distilled and narrower, but for what they do understand their correctness is much better.

nxobject 148 days ago

There are some surprisingly useful "small" use cases for general-purpose LLMs that don't necessarily require broad knowledge – image transcription plus some light post-processing is one I use a lot.

anonym29 148 days ago

TL;DR: The classic CIA triad: Confidentiality, Integrity, Availability; cost/price concerns; the leading open-weight models aren't nearly as bad as you might think.

You don't need LM Studio to run local models, it just (was, formerly), a nice UI to download and manage HF models and llama.cpp updates, quickly and easily manually switch between CPU / Vulkan / ROCm / CUDA (depending on your platform).

Regarding your actual question, there are several reasons.

First off, your allusion to privacy - absolutely, yes, some people use it for adult role-play, however, consider the more productive motivations for privacy, too: a lot of businesses with trade secrets they may want to discuss or work on with local models without ever releasing that information to cloud providers, no matter how much those cloud providers pinky promise to never peek at it. Google, Microsoft, Meta, et al have consistently demonstrated that they do not value or respect customer privacy expectations, that they will eagerly comply with illegal, unconstitutional NSA conspiracies to facilitate bulk collection of customer information / data. There is no reason to believe Anthropic, OpenAI, Google, xAI would act any differently today. In fact, there is already a standing court order forcing OpenAI to preserve all customer communications, in a format that can be delivered to the court (i.e. plaintext, or encryption at rest + willing to provide decryption keys to the court), in perpetuity (https://techstartups.com/2025/06/06/court-orders-openai-to-p...)

There are also businesses which have strict, absolute needs for 24/7 availability and low latency, which remote APIs never have offered. Even if the remote APIs were flawless, and even if the businesses have a robust multi-WAN setup with redundant UPS systems, network downtime or even routing issues are more or less an inevitable fact of life, sooner or later. Having local models means you have inference capability as long as you have electricity.

Consider, too, the integrity front: frontier labs may silently modify API-served models to be lower quality for heavy users with little means of detection by end users (multiple labs have been suspected / accused of this; a lack of proof isn't evidence that it didn't happen) or that the API-served models can be modified over time to patch behaviors that may have been previously relied upon for legitimate workloads (imagine a red team that used a jailbreak to get a model to produce code for process hollowing, for instance). This second example absolutely has happened with almost every inference provider.

The open weight local models also have zero marginal cost besides electricity once the hardware is present, unlike PAYG API models, which create financial lock-in and dependency that is in direct contrast with the financial interests of the customers. You can argue about the amortized costs of hardware, but that's a decision for the customer to make using their specific and personal financial and capex / hardware information that you don't have at the end of the day.

Further, the gap between frontier open weight models and frontier proprietary models has been rapidly shrinking and continues to. See Kimi K2.5, Xiaomi MiMo v2, GLM 4.7, etc. Yes, Opus 4.5, Gemini 3 Pro, GPT-5.2-xhigh are remarkably good models and may beat these at the margin, but most work done via LLMs does not need the absolute best model; many people will opt for a model that gets 95% of the output quality of the absolute frontier model when it can be had for 1/20th the cost (or less).