Hacker News new | ask | show | jobs
by miletus 286 days ago
This is exactly why a lot of people are running local LLMs or moving toward privacy-first platforms

we recently shipped secure mode on https://www.agentsea.com.

With Secure Mode, all chats run either on open-source models or models hosted on our own servers - so you can chat with AI without worrrying about privacy.

3 comments

There is a very strong use case for less-powerful but local LLMs, and there's going to be a big expansion in that area in the next couple years. So big, I'll bet that all the major AI players will do everything they can to cripple them.
Not Nvidia, that's their best scenario
Unfortunately I think you're overestimating how many people care enough about privacy to go through the effort of running LLMs locally and likely buying a GPU
tbh i just run local because i can . No real reason too.

edit: there are some instances where i would like to be able to set the same seed repeatedly which isn't always possible online.

Realistically, how useful is local LLM usage? What are your use cases, hardware, and models used?
I have a old system with 3 ancient Tesla K40s which can easily run inference on ~30B parameter models (e.g. qwen3-coder:30b). I mostly use it as a compute box for other workloads, but its not completely incapable for some AI assisted coding. It is power hungry though, and the recent spike in local electricity rates is enough of an excuse to keep it off most of the time.
I'm surprised the accelerators of yore trick actually worked and balancing a trio is trivially more difficult than duo? I enjoy the idea of having tons of VRAM and system RAM and loading a big model and getting responses a few times per hour as long as its high quality
Yeah, I was equally surprised. I am using a patched version of ollama to run the models: https://github.com/austinksmith/ollama37 which has a trivial change to allow it to run with old versions of cuda (3.5, 3.7). Obviously this was before tensor cores were a thing, so you're not going to be blown away by the performance, but it was cheap. I got 3x k40s for $75 on ebay, they are passively cooled, so they do need to be in a server chassis.
>Realistically, how useful is local LLM usage?

For me, none really, just as a toy. I don't get much use out of online either. There was Kaggle competition to find issues with OpenAI's open weights model, but because my RTX gpu didn't have enough memory i had to run it very slowly from with CPU/ram.

Maybe other people have actual uses, but i don't

The differentiator is that locally, you can use abliterated models - models where they undid the guardrails.
Lots of people already have RTX 3090/4090/5090 for gaming and they can run 30b-class models at 40+ tok/sec. There is a huge field of models and finetunes of this size on huggingface. They are a little bit dumber than the big cloud models but not by much. And being able to run them 24/7 for just the price of electricity (and the privacy) is a big pull.
> they can run 30b-class models at 40+ tok/sec.

No, they can run quantized versions of those models, which are dumber than the base 30b models, which are much dumber than > 400b models (from my use).

> They are a little bit dumber than the big cloud models but not by much.

If this were true, we wouldn't see people paying the premiums for the bigger models (like Claude).

For every use case I've thrown at them, it's not a question of "a little dumber", it's the binary fact that the smaller models are incapable of doing what I need with any sort of consistency, and hallucinate at extreme rates.

What's the actual use case for these local models?

With quantization-aware-training techniques, q4 models are less than 1% off from bf16 models. And yes, if your use case hinges on the very latest and largest cloud-scale models, there are things they can do the local ones just can't. But having them spitting tokens 24/7 for you would have you paying off a whole enterprise-scale GPU in a few months, too.

If anyone has a gaming GPU with gobs of VRAM, I highly encourage they experiment with creating long-running local-LLM apps. We need more independent tinkering in this space.

> But having them spitting tokens 24/7 for you would have you paying off a whole enterprise-scale GPU in a few months, too.

Again, what's the use case? What would make sense to run, at high rates, where output quality isn't much of a concern? I'm genuinely interested in this question, because answering it always seems to be avoided.

What kind of interactions do you have? Brainstorming, knowledge framework, rubber duck debug plus? Help me understand please if you will because I have a 3090 sitting without a suitable rest of it all and I wonder invest or not?
Given that this is in response to a ChatGPT user who killed his mother and then himself, I'm not sure that positioning your product as being more secure than ChatGPT is wise, because your marketing here suggests either:

1. Profound tone-deafness about appropriate contexts for privacy messaging

2. Intentional targeting of users who want to avoid safety interventions

3. A fundamental misunderstanding of your ethical obligations as an AI provider

None of these interpretations reflect well on AgentSea's judgment or values.

I disagree. The fact that the crimes done by a mentally ill person are going to be used as a justification for surveillance on the wider population of users is a strong ethical reason to advocate for more security.
Yeah, it'd be terrible if all our emails, DNS queries, purchase histories, messages, Facebook posts, Google searches, in store purchase, driving and GPS info were being tracked, cataloged, and sold to anyone who wants it! Why, people would never stand for such surveillance!

Anyone with half a brain complaining about hypothetical future privacy violations on some random platform just makes me spit milk out my nose. What privacy?! Privacy no longer exists, and worrying that your chat logs are gonna get sent to the authorities seems to me like worrying that the cops are gonna give you a parking ticket after your car blew up because you let the mechanic put a bomb in the engine.

Things suck therefore it doesn't matter if things suck even more.

Just not a very good argument.

Or maybe I just want to be able to talk to an LLM without worrying about if its going to report me to the authorities.
that’s a good point, privacy is important.

To play devils advocate for a second, what if someone that’s mentally ill uses a local LLM for therapy and doesn’t get the help they need? Even if it’s against their will? And they commit suicide or kill someone because the LLM said it’s the right thing to do…

Is being dead better, or is having complete privacy better? Or does it depend?

I use local LLMs too, but it’s disingenuous to act like they solve the _real_ problem here. Mentally ill people trying to use an LLM for therapy. It can end catastrophically.

I don't want to deal with prompt injection attacks leading to being swatted. That's where all this reporting to the authorities is leading and it's not looking fun.

> Is being dead better, or is having complete privacy better? Or does it depend?

I know you're being provocative, but this feels like a false dichotomy. Mental health professionals are pro-privacy AND have mandatory reporting laws based on their best judgement. Do we trust LLMs to report a suicidal person that has been driven there by the LLM itself?

LLMs can't truly be controlled and can't be designed to not encourage mentally ill people to kill themselves.

> Mentally ill people trying to use an LLM for therapy

Yes indeed this is one of the core problems. I have experimented with this myself and the results were highly discouraging. Others that don't have the same level of discernment for LLM usage may mistake the confidence of the output for a well-trained therapist.

I too think there should be no rules or attempts to derisk any situation, just let us die
Are you in America? Do you also support banning guns?