Samsung workers made a major error by using ChatGPT | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Samsung workers made a major error by using ChatGPT (techradar.com)
	78 points by deesep 1170 days ago

6 comments

ftxbro 1170 days ago

The article says "now in the wild after being leaked" but then it says "the data is impossible to retrieve as it is now stored on the servers belonging to OpenAI." So did the source code leak out of OpenAI into the wild, or are they saying that OpenAI itself is "the wild"? As far as I see from the article, it's not accessible to the general public.

dTal 1169 days ago

>So did the source code leak out of OpenAI into the wild, or are they saying that OpenAI itself is "the wild"?

The second one. OpenAI is now in possession of Samsung trade secrets. To Samsung, that's "in the wild". And that's a reasonable viewpoint - OpenAI could easily leak chat logs, overfit future models on this data etc, and there's nothing Samsung can now do about it.

brundolf 1170 days ago

If ChatGPT is trained on the data and ChatGPT is accessible to the general public, then the data may as well be accessible to the general public

sterlind 1170 days ago

it seems unlikely to me that ChatGPT is directly trained on chat data. if it is, we should see it know information past its knowledge cutoff. afaik that hasn't happened.

I assume the chat logs are instead training a reward model, which itself is then used as the reward function during RLHF training.

bigyikes 1170 days ago

These models have a very long lead time before they’re released to the public. Maybe GPT-5 is being trained on ChatGPT logs. I’m not sure we’d be able to detect if this was happening.

brundolf 1170 days ago

I think the market is going to explode (if it hasn't already) for on-prem, or at least private, LLMs on par with ChatGPT. This could be served by companies building their own, or by open-source projects, or by OpenAI or OpenAI's competitors

As a side-effect, this feels like a bright spot in the potentially authoritarian trajectory that AI could take as labor becomes less and less valuable. It encourages development of LLMs that compete with the current default option and can be run on more and more limited hardware. Enterprises might even want separate departments, or separate individuals, to be able to run their own models to prevent leakage

eachro 1170 days ago

Open source is a race to the bottom. Seems like the only obvious winner then is people selling the shovels aka NVIDIA.

esafak 1170 days ago

More like a race to the top: you're forgetting all the applications that will be built on top of these open source models.

eachro 1170 days ago

I'm not so sure. Does anyone pay for pytorch, numpy, tensorflow? In the matter of weeks we've seen llama.cpp, alpaca.cpp released to the public. Barrier to entry in this market is quickly going to zero.

jerojero 1170 days ago

Has Pytorch and numpy's quality gone down though?

These projects are funded by several organisations that rely on them for their operations. This is the same for Linux, just because the software is accesible for free doesn't mean it's not funded.

I really want to see governments putting more funding on open source as well, public money, public code. As they say.

esafak 1169 days ago

Those are libraries, not applications. Think of copilots, midjourney, etc.

pixl97 1169 days ago

No one pays for Windows, SQL server, and Office. That's why Microsoft is poor.

rvz 1170 days ago

Finally someone is thinking. Stable Diffusion is already at the finish line to the bottom, since their AI model is already open source. Many other open source LLMs and DALL-E 2 alternatives are available competing against O̶p̶e̶n̶AI.com.

They are all gradually catching up and O̶p̶e̶n̶AI.com cannot run their services for free forever or even close to free. Eventually the price hikes will come in.

But as long as the AI industry continues to use inefficient methods of training, fine-tuning and inference via using tons of GPU hardware, NVIDIA will continue to smile at relying on this for a long time until a true breakthrough in efficient training and inference methods in neural networks on everyday desktop or typical servers.

withinrafael 1170 days ago

Confusing article. It appears the company discovered employees were pasting confidential information into ChatGPT and are assuming that data is now comprised given OpenAI policies stating conversations are periodically reviewed and used for training. The data doesn't appear to be accessible to the public directly.

voytec 1170 days ago

It says at the beginning, that Samsung allowed employees to do so:

>> The company allowed engineers at its semiconductor arm to use the AI writer to help fix problems with their source code.

> The data doesn't appear to be accessible to the public directly.

Allegedly (reddit), some random chats were recently listed in people's chat lists[1]

[1] https://news.ycombinator.com/item?id=35236660

navanchauhan 1170 days ago

Not allegedly, they actually were.[0]

[0] https://openai.com/blog/march-20-chatgpt-outage

withinrafael 1170 days ago

I forgot about that incident, good call! It's not inconceivable that one of these chats got reported back to Samsung.

bob1029 1170 days ago

How did they gain access to ChatGPT from their offices?

I worked in the ATX factory about a decade ago and the network was very locked-down at the time. You can't even get your phone into the building without a security guard doing things to it. Taking basic stuff like paper in/out is also disallowed.

I would have expected a total ban on personal computing devices leaving the parking lot if this happened during my time there.

mrleinad 1169 days ago

> How did they gain access to ChatGPT from their offices?

>> The company allowed engineers at its semiconductor arm to use the AI writer to help fix problems with their source code.

rvz 1170 days ago

This is exactly why Wall St. and the major banks have banned their employees using ChatGPT [0] [1] [2]. It is called regulation and compliance.

Some companies drinking the AI koolaid just seem to love learning the hard way.

[0] https://www.wsj.com/articles/jpmorgan-restricts-employees-fr...

[1] https://tech.co/news/wall-street-banks-ban-ai-chatgpt

[2] https://www.bloomberg.com/news/articles/2023-02-24/citigroup...

galleywest200 1170 days ago

For what its worth, this is only via the ChatGPT interface. Their terms of service state that API calls are not processed in the same manner for review and are deleted after 30 days.

I suppose these companies can build their own, in-house, Chat application.

blibble 1169 days ago

if I was a compliance officer I would have zero confidence in their terms of service of a company that's business model is entirely dependent on completely disregarding copyright law

mdmglr 1168 days ago

Is there any written policy from OpenAI about how they protect chat data? How long they retain it? How they prevent chats from being hallucinated across user sessions?

OpenAI should be worried and self regulate before the gov’t steps in and does it for them.