Hacker News new | ask | show | jobs
by moonchrome 1246 days ago
So you are providing sensitive business information/facts to a third party service that's likely going to use those for training, analysis, store it, etc. ?

It should be fine for most places I guess - but I suspect a decent amount will have a problem with this.

This is my main reservation about copilot as well (quality issues aside).

4 comments

> So you are providing sensitive business information/facts to a third party service that's likely going to use those for training, analysis, store it, etc. ?

Every business needs to make their own decision. Personally, I’m not worried about OpenAI using my data, but I understand others might be. That being said, I already give Amazon literally all my data about my business via AWS and Google gets a copy of all my documents, so providing this data isn’t entirely unprecedented.

The difference is that GPT is training future versions based on the information you provide. It could potentially be returned to a future user.

AWS does no such thing.

Are you talking about the free to use ChatGPT application or the paid APIs? Very different things.
I don't know a lot about this. If you use a paid API, does that mean they promise not to use it for training?

Let's say I have a bunch Excel files and a bunch of accompany Word files (that are based on info from each Excel file, along with, to a lesser extent, information in the real world). Can I use the paid API to train on my data, and then when I provide it a new Excel file it can generate the accompanying text? And have them not use the information in them when generating results for other users.

I'm just responding to the parent's point about the disclaimer. That disclaimer is only shown to ChatGPT users.

I get the general impression, even on this site, that people seem to think that OpenAI'S sophisticated text generation capability is only possible through ChatGPT, which is not the case at all. It's just the only _free_ product.

You're right, I was thinking about ChatGPT. Don't know how it works for the APIs.
It's easy to replace terms with something else in order to keep information confidential.
It's also easy to forget about it. Or get complacent. Or not realize some of the things you didn't replace should have been.

But then, Grammarly managed to make a successful business out of a keylogger. And if you're working in a mid-to-large company and developing on Windows, chances are your corporate-mandated AV is sending every single binary you build to the AV company servers. And so on and so on. So I suppose this ship has already sailed, and you may just as well plug your company's Exchange server and Sharepoint to ChatGPT and see if it can generate your next PowerPoint presentation for you.

I always go out of my way to protect the information and IP I'm contractually obliged to protect, but it's getting increasingly hard these days. It's also very frustrating when you notice and report a new process that silently sends company IP to third parties to do $deity knows what with it, and you discover that instead of being concerned, the people you report your findings to don't care and don't understand why you're making a fuss out of it.

If you replace terms with something else, what sort of value can you find in the responses you get back? On the one hand, the LLM does not know the code you are using; alternatively, if you restrict your code to using well-understood near synonyms, you are not really obfuscating.
> The difference is that GPT is training future versions based on the information you provide.

Do they say so in their T&C or elsewhere, and if not how do you know?

Would have been easier to read the T&C than to post this comment. The answer is yes. OpenAI specifically says in their T&C that your content will be used to train their models.
Not only that, it also gives a popup that states this clearly, as well as a request to not input confidential information when you first start using it.

Do people just not read anymore?

> Do people just not read anymore?

Well, you're in a thread that's fundamentally about how people just don't write anymore...

It literally says so in a popup which must be accepted before first use. No need to dig in the T&Cs.
Fair, but if you're writing a communication where you care about wording, it's probably going to be public or semi-public anyway
Something internal that doesnt need to be understood can usually just not be written in the kinds of organizations I'm familiar with. Writing because something written is required is usually for the public, cross-organizational or government.
I didn't think copilot was actually sending your source code back to GitHub? It's trained on OSS.
Copilot sends chunks of your code to the server for every suggestion (whether or not you accept it). That's the only way it could possibly work, since the model is running on a remote server.

https://thakkarparth007.github.io/copilot-explorer/posts/cop...

Snippets of your local code are sent to the API as part of the Copilot request context.
When I looked into this a little bit, I found Copilot for Business states they do not record any of the info you send them. Though obviously you are still sending out code to a third party. I could not find any info about their policies for “normal” Copilot which I assume means they are recording all the code sent to it and using our code to further train the model.
Still hard to trust. Many companies save info for training/qa purposes and hide details in obscure language. Voice assistants for example. Didn't Roomba get in trouble recently regarding photos uploaded and then shared with contractors for qa that ultimately shared the embarrasing photos more publicly.
I hope I never have to work at place like this