| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wizee 313 days ago
	Privacy, both personal and for corporate data protection is a major reason. Unlimited usage, allowing offline use, supporting open source, not worrying about a good model being taken down/discontinued or changed, and the freedom to use uncensored models or model fine tunes are other benefits (though this OpenAI model is super-censored - “safe”). I don’t have much experience with local vision models, but for text questions the latest local models are quite good. I’ve been using Qwen 3 Coder 30B-A3B a lot to analyze code locally and it has been great. While not as good as the latest big cloud models, it’s roughly on par with SOTA cloud models from late last year in my usage. I also run Qwen 3 235B-A22B 2507 Instruct on my home server, and it’s great, roughly on par with Claude 4 Sonnet in my usage (but slow of course running on my DDR4-equipped server with no GPU).

4 comments

M4R5H4LL 313 days ago

+1 - I work in finance, and there's no way we're sending our data and code outside the organization. We have our own H100s.

link

filoleg 313 days ago

Add big law to the list as well. There are at least a few firms here that I am just personally aware of running their models locally. In reality, I bet there are way more.

link

atlasunshrugged 312 days ago

Add government here too (along with all the firms that service government customers)

link

rasmus1610 312 days ago

Add healthcare. Cannot send our patients data to a cloud provider

link

nixgeek 312 days ago

A ton of EMR systems are cloud-hosted these days. There’s already patient data for probably a billion humans in the various hyperscalers.

Totally understand that approaches vary but beyond EMR there’s work to augment radiologists with computer vision to better diagnose, all sorts of cloudy things.

It’s here. It’s growing. Perhaps in your jurisdiction it’s prohibited? If so I wonder for how long.

link

fineIllregister 312 days ago

In the US, HIPAA requires that health care providers complete a Business Associate Agreement with any other orgs that receive PHI in the course of doing business [1]. It basically says they understand HIPAA privacy protections and will work to fulfill the contracting provider's obligations regarding notification of breaches and deletion. Obviously any EMR service will include this by default.

Most orgs charge a huge premium for this. OpenAI offers it directly [2]. Some EMR providers are offering it as an add-on [3], but last I heard, it's wicked expensive.

1: https://www.hhs.gov/hipaa/for-professionals/covered-entities...

2: https://help.openai.com/en/articles/8660679-how-can-i-get-a-...

3: https://www.ntst.com/carefabric/careguidance-solutions/ai-do...

link

londons_explore 312 days ago

Even if it's possible, there is typically a lot of paperwork to get that stuff approved.

There might be a lot less paperwork to just buy 50 decent GPU's and have the IT guy self-host.

link

kakoni 312 days ago

Europe? US? In Finland doctors can send live patient encounters to azure openai for transcription and summarization.

link

filoleg 312 days ago

In the US, it would be unthinkable for a hospital to send patient data to something like ChatGPT or any other public services.

Might be possible with some certain specific regions/environments of Azure tho, because iirc they have a few that support government confidentiality type of stuff, and some that tout HIPAA compliance as well. Not sure about details of those though.

link

LinXitoW 312 days ago

Possibly stupid question, but does this apply to things like M365 too? Because just like with Inference providers, the only thing keeping them from reading/abusing your data is a pinky promise contract.

Basically, isn't your data as safe/unsafe in a sharepoint folder as it is sending it to a paid inference provider?

link

Bombthecat 312 days ago

Yap, companies are just paranoid, because it's new. Just like the cload back then. Sooner or later everyone will use an ai provider

link

megaloblasto 312 days ago

A lot of people and companies use local storage and compute instead of the cloud. Cloud data is leaked all the time.

link

Foobar8568 312 days ago

Look at (private) banks in Switzerland, there are enough press release, and I can confirm most of them.

Managing private clients direct data is still a concern if it can be directly linked to them.

Only JB I believe have on premise infrastructure for these use cases.

link

helsinki 312 days ago

This is not a shared sentiment across the buy side. I’m guessing you work at a bank?

link

undefuser 312 days ago

Does it mean that renting a Bare metal server with H100s is also out of question for your org?

link

arkonrad 312 days ago

Do you have your own platform to run inference?

link

captainregex 313 days ago

I do think Devs are one of the genuine users of local into the future. No price hikes or random caps dropped in the middle of the night and in many instances I think local agentic coding is going to be faster than the cloud. It’s a great use case

link

exasperaited 312 days ago

I am extremely cynical about this entire development, but even I think that I will eventually have to run stuff locally; I've done some of the reading already (and I am quite interested in the text to speech models).

(Worth noting that "run it locally" is already Canva/Affinity's approach for Affinity Photo. Instead of a cloud-based model like Photoshop, their optional AI tools run using a local model you can download. Which I feel is the only responsible solution.)

link

mark_l_watson 312 days ago

I agree totally. My only problem is local models running on my old macMini run very much slower than that for example Gemini-2.5-flash. I have my Emacs setup so I can switch between a local model and one of the much faster commercial models.

Someone else responded to you about working for a financial organization and not using public APIs - another great use case.

link

gorbypark 312 days ago

These being mixture of expert (MOE) models should help. The 20b model only has 3.6b params active at any one time, so minus a bit of overhead the speed should be like running a 3.6b model (while still requiring the RAM of a 20b model).

Here's the ollama version (4.6bit quant, I think?) run with --verbose total duration: 21.193519667s load duration: 94.88375ms prompt eval count: 77 token(s) prompt eval duration: 1.482405875s prompt eval rate: 51.94 tokens/s eval count: 308 token(s) eval duration: 19.615023208s eval rate: 15.70 tokens/s

15 tokens/s is pretty decent for a low end MacBook Air (M2, 24gb of ram). Yes, it's not the ~250 tokens/s of 2.5-flash, but for my use case anything above 10 tokens/sec is good enough.

link

robwwilliams 313 days ago

Yes, and help with grant reviews. Not permitted to use web AI.

link