Hacker News new | ask | show | jobs
Samsung workers made a major error by using ChatGPT (techradar.com)
78 points by deesep 1170 days ago
6 comments

The article says "now in the wild after being leaked" but then it says "the data is impossible to retrieve as it is now stored on the servers belonging to OpenAI." So did the source code leak out of OpenAI into the wild, or are they saying that OpenAI itself is "the wild"? As far as I see from the article, it's not accessible to the general public.
>So did the source code leak out of OpenAI into the wild, or are they saying that OpenAI itself is "the wild"?

The second one. OpenAI is now in possession of Samsung trade secrets. To Samsung, that's "in the wild". And that's a reasonable viewpoint - OpenAI could easily leak chat logs, overfit future models on this data etc, and there's nothing Samsung can now do about it.

If ChatGPT is trained on the data and ChatGPT is accessible to the general public, then the data may as well be accessible to the general public
it seems unlikely to me that ChatGPT is directly trained on chat data. if it is, we should see it know information past its knowledge cutoff. afaik that hasn't happened.

I assume the chat logs are instead training a reward model, which itself is then used as the reward function during RLHF training.

These models have a very long lead time before they’re released to the public. Maybe GPT-5 is being trained on ChatGPT logs. I’m not sure we’d be able to detect if this was happening.
I think the market is going to explode (if it hasn't already) for on-prem, or at least private, LLMs on par with ChatGPT. This could be served by companies building their own, or by open-source projects, or by OpenAI or OpenAI's competitors

As a side-effect, this feels like a bright spot in the potentially authoritarian trajectory that AI could take as labor becomes less and less valuable. It encourages development of LLMs that compete with the current default option and can be run on more and more limited hardware. Enterprises might even want separate departments, or separate individuals, to be able to run their own models to prevent leakage

Open source is a race to the bottom. Seems like the only obvious winner then is people selling the shovels aka NVIDIA.
More like a race to the top: you're forgetting all the applications that will be built on top of these open source models.
I'm not so sure. Does anyone pay for pytorch, numpy, tensorflow? In the matter of weeks we've seen llama.cpp, alpaca.cpp released to the public. Barrier to entry in this market is quickly going to zero.
Has Pytorch and numpy's quality gone down though?

These projects are funded by several organisations that rely on them for their operations. This is the same for Linux, just because the software is accesible for free doesn't mean it's not funded.

I really want to see governments putting more funding on open source as well, public money, public code. As they say.

Those are libraries, not applications. Think of copilots, midjourney, etc.
No one pays for Windows, SQL server, and Office. That's why Microsoft is poor.
Finally someone is thinking. Stable Diffusion is already at the finish line to the bottom, since their AI model is already open source. Many other open source LLMs and DALL-E 2 alternatives are available competing against O̶p̶e̶n̶AI.com.

They are all gradually catching up and O̶p̶e̶n̶AI.com cannot run their services for free forever or even close to free. Eventually the price hikes will come in.

But as long as the AI industry continues to use inefficient methods of training, fine-tuning and inference via using tons of GPU hardware, NVIDIA will continue to smile at relying on this for a long time until a true breakthrough in efficient training and inference methods in neural networks on everyday desktop or typical servers.

Confusing article. It appears the company discovered employees were pasting confidential information into ChatGPT and are assuming that data is now comprised given OpenAI policies stating conversations are periodically reviewed and used for training. The data doesn't appear to be accessible to the public directly.
It says at the beginning, that Samsung allowed employees to do so:

>> The company allowed engineers at its semiconductor arm to use the AI writer to help fix problems with their source code.

> The data doesn't appear to be accessible to the public directly.

Allegedly (reddit), some random chats were recently listed in people's chat lists[1]

[1] https://news.ycombinator.com/item?id=35236660

Not allegedly, they actually were.[0]

[0] https://openai.com/blog/march-20-chatgpt-outage

I forgot about that incident, good call! It's not inconceivable that one of these chats got reported back to Samsung.
How did they gain access to ChatGPT from their offices?

I worked in the ATX factory about a decade ago and the network was very locked-down at the time. You can't even get your phone into the building without a security guard doing things to it. Taking basic stuff like paper in/out is also disallowed.

I would have expected a total ban on personal computing devices leaving the parking lot if this happened during my time there.

> How did they gain access to ChatGPT from their offices?

>> The company allowed engineers at its semiconductor arm to use the AI writer to help fix problems with their source code.

This is exactly why Wall St. and the major banks have banned their employees using ChatGPT [0] [1] [2]. It is called regulation and compliance.

Some companies drinking the AI koolaid just seem to love learning the hard way.

[0] https://www.wsj.com/articles/jpmorgan-restricts-employees-fr...

[1] https://tech.co/news/wall-street-banks-ban-ai-chatgpt

[2] https://www.bloomberg.com/news/articles/2023-02-24/citigroup...

For what its worth, this is only via the ChatGPT interface. Their terms of service state that API calls are not processed in the same manner for review and are deleted after 30 days.

I suppose these companies can build their own, in-house, Chat application.

if I was a compliance officer I would have zero confidence in their terms of service of a company that's business model is entirely dependent on completely disregarding copyright law
Is there any written policy from OpenAI about how they protect chat data? How long they retain it? How they prevent chats from being hallucinated across user sessions?

OpenAI should be worried and self regulate before the gov’t steps in and does it for them.