Hacker News new | ask | show | jobs
by wizee 313 days ago
Privacy, both personal and for corporate data protection is a major reason. Unlimited usage, allowing offline use, supporting open source, not worrying about a good model being taken down/discontinued or changed, and the freedom to use uncensored models or model fine tunes are other benefits (though this OpenAI model is super-censored - “safe”).

I don’t have much experience with local vision models, but for text questions the latest local models are quite good. I’ve been using Qwen 3 Coder 30B-A3B a lot to analyze code locally and it has been great. While not as good as the latest big cloud models, it’s roughly on par with SOTA cloud models from late last year in my usage. I also run Qwen 3 235B-A22B 2507 Instruct on my home server, and it’s great, roughly on par with Claude 4 Sonnet in my usage (but slow of course running on my DDR4-equipped server with no GPU).

4 comments

+1 - I work in finance, and there's no way we're sending our data and code outside the organization. We have our own H100s.
Add big law to the list as well. There are at least a few firms here that I am just personally aware of running their models locally. In reality, I bet there are way more.
Add government here too (along with all the firms that service government customers)
Add healthcare. Cannot send our patients data to a cloud provider
A ton of EMR systems are cloud-hosted these days. There’s already patient data for probably a billion humans in the various hyperscalers.

Totally understand that approaches vary but beyond EMR there’s work to augment radiologists with computer vision to better diagnose, all sorts of cloudy things.

It’s here. It’s growing. Perhaps in your jurisdiction it’s prohibited? If so I wonder for how long.

In the US, HIPAA requires that health care providers complete a Business Associate Agreement with any other orgs that receive PHI in the course of doing business [1]. It basically says they understand HIPAA privacy protections and will work to fulfill the contracting provider's obligations regarding notification of breaches and deletion. Obviously any EMR service will include this by default.

Most orgs charge a huge premium for this. OpenAI offers it directly [2]. Some EMR providers are offering it as an add-on [3], but last I heard, it's wicked expensive.

1: https://www.hhs.gov/hipaa/for-professionals/covered-entities...

2: https://help.openai.com/en/articles/8660679-how-can-i-get-a-...

3: https://www.ntst.com/carefabric/careguidance-solutions/ai-do...

Even if it's possible, there is typically a lot of paperwork to get that stuff approved.

There might be a lot less paperwork to just buy 50 decent GPU's and have the IT guy self-host.

Europe? US? In Finland doctors can send live patient encounters to azure openai for transcription and summarization.
In the US, it would be unthinkable for a hospital to send patient data to something like ChatGPT or any other public services.

Might be possible with some certain specific regions/environments of Azure tho, because iirc they have a few that support government confidentiality type of stuff, and some that tout HIPAA compliance as well. Not sure about details of those though.

Possibly stupid question, but does this apply to things like M365 too? Because just like with Inference providers, the only thing keeping them from reading/abusing your data is a pinky promise contract.

Basically, isn't your data as safe/unsafe in a sharepoint folder as it is sending it to a paid inference provider?

Yap, companies are just paranoid, because it's new. Just like the cload back then. Sooner or later everyone will use an ai provider
A lot of people and companies use local storage and compute instead of the cloud. Cloud data is leaked all the time.
Look at (private) banks in Switzerland, there are enough press release, and I can confirm most of them.

Managing private clients direct data is still a concern if it can be directly linked to them.

Only JB I believe have on premise infrastructure for these use cases.

This is not a shared sentiment across the buy side. I’m guessing you work at a bank?
Does it mean that renting a Bare metal server with H100s is also out of question for your org?
Do you have your own platform to run inference?
I do think Devs are one of the genuine users of local into the future. No price hikes or random caps dropped in the middle of the night and in many instances I think local agentic coding is going to be faster than the cloud. It’s a great use case
I am extremely cynical about this entire development, but even I think that I will eventually have to run stuff locally; I've done some of the reading already (and I am quite interested in the text to speech models).

(Worth noting that "run it locally" is already Canva/Affinity's approach for Affinity Photo. Instead of a cloud-based model like Photoshop, their optional AI tools run using a local model you can download. Which I feel is the only responsible solution.)

I agree totally. My only problem is local models running on my old macMini run very much slower than that for example Gemini-2.5-flash. I have my Emacs setup so I can switch between a local model and one of the much faster commercial models.

Someone else responded to you about working for a financial organization and not using public APIs - another great use case.

These being mixture of expert (MOE) models should help. The 20b model only has 3.6b params active at any one time, so minus a bit of overhead the speed should be like running a 3.6b model (while still requiring the RAM of a 20b model).

Here's the ollama version (4.6bit quant, I think?) run with --verbose total duration: 21.193519667s load duration: 94.88375ms prompt eval count: 77 token(s) prompt eval duration: 1.482405875s prompt eval rate: 51.94 tokens/s eval count: 308 token(s) eval duration: 19.615023208s eval rate: 15.70 tokens/s

15 tokens/s is pretty decent for a low end MacBook Air (M2, 24gb of ram). Yes, it's not the ~250 tokens/s of 2.5-flash, but for my use case anything above 10 tokens/sec is good enough.

Yes, and help with grant reviews. Not permitted to use web AI.