Hacker News new | ask | show | jobs
by phdelightful 1109 days ago
From my perspective, it seems two main things limit LLM adoption in my area of the Department of Energy. I'm not in management, so I don't have any particular insight in the procurement process.

1. Information sensitivity. Even ignoring classified information, there are quite a few things we can't even put into a Google search. It's definitely a no-go for this to end up in a training dataset.

2. "Hallucinations"

Making LLM available through some infrastructure that is already approved for sensitive information will definitely help with the first point, and allow us to experiment with more areas where it might be helpful. I presume this would come along with guarantees about the interactions not being used for training.

It might be even better if some company would sell an appliance we could install on-prem with similar non-training guarantees. Then we could leverage these new tools for very sensitive information, which could be a great help.

4 comments

The first item is addressed in the article — data never leaves the government cloud and isn’t used for training.
https://www.theregister.com/2023/02/23/azure_dod_emails_expo...

> "Documents Sen shared with The Register said to be from the exposed server include a rich amount of data that certainly be valuable to a foreign adversary. It included all the usual PII, as well as blood type, religious affiliation, educational background, military service history and more, all in plain text. Sen told us that close to 3TB of data was available before the Azure server was taken offline on Monday."

That’s not running in the gov cloud though.

Any any operation can make a huge mistake like that. E.g. the guy that leaked all those documents to discord.

I doubt that Microsoft has an agreement with OpenAI that would allow Microsoft on-prem model deployment to third parties. The danger is too high that the GPT-4 weights would leak. They are probably worth billions of dollars.
Yes but you need quite expensive hardware to run it, I doubt the magic that goes into building the weights can be gleaned from them, and anyone using it (in certain jurisdictions) can be destroyed in court.

I understand they want to protect their IP but I don’t think the model leaking will cost openai billions.

>> Then we could leverage these new tools

No. These new tools cannot provide leverage. They just produce a street pizza (1) based on the inputs they are given. Whether the street pizza is any good depends on the quality of the ingredients and how discerning the consumer is.

1. https://englishdaily626.com/slang.php?173

> Even ignoring classified information, there are quite a few things we can't even put into a Google search This doesn't say very much, Google is a public service. Government cloud providers have special regions just for the government that are compliant with data security policies.