| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alnorth 1255 days ago
	So if my competitor is using IngestAI and OpenAI use their data to train ChatGPT, could I literally just ask ChatGPT to tell me some secrets from my competitor's internal communication?

4 comments

literalAardvark 1255 days ago

2023 is going to be very exciting times for security engineers.

It won't have the data, but it might have enough of an understanding of the data to leak important information.

link

vidarh 1255 days ago

While the model clearly can't retain all data, ChatGPT can regurgitate a lot of stuff verbatim.

Prompt:

> Recite the first two paragraphs of Neuromancer.

Response:

> Certainly! Here are the first two paragraphs of "Neuromancer" by William Gibson:

> "The sky above the port was the color of television, tuned to a dead channel.

> 'It's not like I'm using,' Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. 'It's like my body's developed this massive drug deficiency.' It was a Sprawl voice and a Sprawl joke. The Chatsubo was a bar for professional expatriates; you could drink there for a week and never hear two words in Japanese."

(I have not checked how far you can get it to continue)

So perhaps it'll be a question of whether enough of your employees are feeding it copies of your data for it to retain it...

link

LeanderK 1255 days ago

I bet that getting the right prompts won't be easy so it will probably fly under the radar and not immediately be detected. You can't search these weights with command-f. Fun times ahead...

link

nextaccountic 1254 days ago

Prompt engineering is only getting better. Also,

> You can't search these weights with command-f

Sometimes you can, https://clementneo.com/posts/2023/02/11/we-found-an-neuron

link

Jensson 1255 days ago

And good luck trying to add data to it without corrupting some other data it has encoded.

link

osigurdson 1255 days ago

Does this problem disappear when using the Azure version of the service? If not, this is a pretty obvious market need: LLM + privacy.

link

flangola7 1255 days ago

Most AI companies won't want to offer that. They want to know if someone is using their service to instigate the next mass shooting or ethic genocide.

link

Vasyl_R 1255 days ago

a good point. And what about companies that have on-premise storage?

link

Vasyl_R 1255 days ago

yes, with OpenAI and also our type of apps security engineers have to move also move next level. And companies have to understand that it's context-aware only based on the knowledge-base you upload. It can not go and grab some data on your PC just because some one would ask it in chat))

BTW, Thanks for your comments! Appreciate it a lot.

link

bluejay2387 1255 days ago

This is a well known problem with this technology (although I haven't seen an official term for it, so we have been calling them "recovery attacks'). It's apparently the reason companies like Amazon have banned internal use of services like ChatGPT. I should add while it has been proven to occur the likelihood of something like this is very low. It's going to be a rare occurrence.

link

Vasyl_R 1255 days ago

thanks for sharing. Do you think if all would be on the client's local server or cloud there would be still some, even rare, occurrence of that?

link

bluejay2387 1254 days ago

The problem could still occur but you would have to be capturing all the queries to your internal LLM systems and then using that data for training. You have complete control of the model so you could just choose not to do that and I would think data leaks of this nature would be less of a concern for an internal environment anyway. You would know that only authorized individuals would have access to the data. I suppose there could still be a very small chance of leaking data to unauthorized employees, but if a rogue employee wants to access data they should not have access to fishing an LLM would probably be the least productive way to do that. Your access logs for the LLM system would clearly display the attempts.

Some commercial services are starting to offer "Enterprise" licenses that prohibit the collection and use for training of your data and that would address the concern as well.

link

zitterbewegung 1255 days ago

If a server was misconfigured OpenAI could have been trained on non public information. You can also poison OpenAI's dataset if you know it has been pulled by the service.

link

Vasyl_R 1255 days ago

on a higher level of understanding, yes. But it would answer queries / it would be contextually intelligent only based on the previously uploaded knowledge-base. Did I get right your point?

link