| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by whimsicalism 871 days ago
	do you not trust the setting OAI provides to exclude your conversation from training data?

4 comments

tempusalaria 871 days ago

1) OpenAI has consistently gone back on commitments it has made

2) Sam Altman has a shady track record publicly, and if you believe the things people say privately he has consistently done business very dishonestly throughout his career. He is the CEO, and virtually the entire executive team are people he brought in from his network. It’s his company.

3) To give one example of many, OpenAI recently changed the terms of ChatGPT so that web users conversations can now be trained on (and if you want to save any chats you must opt in). Presumably this also applies to all conversations you had under the old policy despite saying they would never train on those conversations.

I could go on at length…

anonylizard 871 days ago

If you don't trust OpenAI, you can choose to trust Microsoft. Azure OpenAI is 3-months behind OpenAI that costs 3x as much, but you get more 'security' and better performance.

tempusalaria 871 days ago

Yes and I recommend that people interested in GPT-4 use this service as it’s isolated from OpenAI and it is the absolute best model right now.

That said, there are 3 quite worrying future possibilities, both stemming from the level of investment and commitment by Microsoft in OpenAI - a huge percentage of Azure’s value is bet on an exclusive partnership with them. That gives OpenAI a lot of leverage. And customer data is a very tempting cookie jar to build a long term moat for these companies

Possibility 1: someone overtakes OpenAI models and you’re stuck on Azure who won’t offer that model

Possibility 2: OpenAI decides to break with Microsoft and you’re stuck with a useless application. AWS Bedrock is far less likely to leave an AI application obsolete.

Possibility 3: OpenAI put pressure on Microsoft to loosen the data protections around the service, or go around them entirely. This is a particular concern as the systems in the service become more complex and agentic and more difficult for Microsoft to audit. Model weights are highly opaque and Microsoft cannot trace the exact possible behaviour of these systems. What if GPT-6 changes its own weights during inference for example? How can Microsoft ever understand if that’s a true critical piece of functionality or a proxy method to access customer data etc.

whimsicalism 871 days ago

Good luck to OAI breaking with Microsoft when they have a 49% ownership share in your company and own all of your GPUs.

ParetoOptimal 871 days ago

I don't see how anyone could trust Microsoft or openai given their track records.

CharlesW 871 days ago

> Azure OpenAI is 3-months behind OpenAI…

How is it 3 months behind if you get access to current OpenAI models?

anonylizard 871 days ago

You don't.

It took like 2 extra months for Azure OpenAI to get GPT-4 turbo. There's a noticeable time delay between OpenAI deploying their latest model and when Microsoft manages to shove it in Azure.

drittich 870 days ago

And it's not just the models. E.g., the Assistants API is not available yet in Azure, and there is no expected ship date for it. But I'm confident it's coming.

CharlesW 871 days ago

Great to know, thank you.

whimsicalism 871 days ago

Hm. I disagree that executive level shenanigans translate to real world non-compliance with privacy law when there is an explicit request to delete/not store data.

Nevermark 871 days ago

> I disagree that executive level shenanigans translate to real world non-compliance with ________ law

Uber

Airbnb

Facebook

The entire financial industry

also, every huge corporation that paid a vast (but relative to their market cap, insignificant) settlement, long after incidents in question, without admitting guilt.

Corporations have essentially culled vast numbers of humans until forced to stop. What’s a little lucrative nosiness in that context?

The massive tide of legally gray (including very dark gray) media hoovered up by training data vacuums isn’t exactly an industry secret. Whether any known player is more serious about protecting data source interests over their own ambitions remains to be verifiably demonstrated.

Someone once said, “it is easier to ask forgiveness than permission”. Someone might add, “if ever, or only performatively for congressional testimony theatre purposes, and definitely only after requiring a more punitive legal/regulatory moat to lock in the benefits of your non-compliance”.

ANY sketchiness should be taken seriously. Not making accusations. But sooner or later, somebody is going to say, “it’s a lot easier to cry and complain than claw back information someone took from you, who has billions to spend on lawyers”.

whimsicalism 871 days ago

None of these cases involve unambiguously claiming that you will delete data and then not doing it.

Closer would be FTX which unambiguously claimed one thing and did the opposite, but I do not think OAI has FTX level of dysfunction.

Nevermark 871 days ago

Lots of companies have interpreted “deleting data” in creative ways, or just declared “oopsie” when caught. [0][1][2]

[0] Google: lied about deleting data (Google as a verb)

[1] Google: lied about deleting data (Google as a noun) https://arstechnica.com/tech-policy/2023/02/us-says-google-r...

[2] Google: lied about deleting data (Google as a definition in Webster’s dictionary -> Rickroll)

hackerlight 870 days ago

Even if there weren't shenanigans, it's valid to be concerned. Incentives lead to outcomes. Companies do a cost-benefit analysis, if the legal/reputational costs are less than what they stand to gain, history shows that they'll do the thing and then lie about it. Sam might be uniquely resistant to this due to a personal ethical code, but it's impossible to know for sure given that I can't read his mind.

rgbrgb 871 days ago

about as much as I trust 23andme to keep my genetic data private

i give their intent the benefit of the doubt but I’ve been on the other side of too many data collection systems to trust that theirs is foolproof and more secure than my local machine.

whimsicalism 871 days ago

To me there is a significant difference between permanent storing and “keeping private” (23andme is legally prohibited from deleting your genetic data) and not using and deleting after 30 days.

fwiw 23andme genetic data (distinct from summary ancestry data) has not leaked afaik

anonylizard 871 days ago

Your average SQL query is worthless, there no special sauce in SQL queries, their value comes from the dataset they run on.

These small SQL LLMs are indeed worthless, since LLM performance on SQL queries is quite limited right now, every % of accuracy matters.

rgbrgb 871 days ago

with data privacy decisions I like to think about how the policy/reasoning would sound in a NYT headline or congress testimony. your stance does not imbue trust for me.

0 employment contracts or user facing ToS I've read make a distinction between sharing schema vs data with unvetted third parties. In my opinion, schemas reveal pretty valuable info about an application (and they're absolutely valuable in aggregate because you can use them to train AI data scientists).

Maybe I sound like a curmudgeon but since I have the option of running the AI locally with a 5 percentage point accuracy loss, I absolutely will. If GPT-4 was 100% that would be different because you could build totally different things, but 83% has most of the same design problems as 78%.

whimsicalism 871 days ago

Did you mean to reply to a different comment?

ren_engineer 871 days ago

it's not about trust, they will always be indirectly controlled by the US government who could force them to leak/release your data. OpenAI is already working with the US military and removed their restrictions on allowing their AI to be used for military purposes

https://time.com/6556827/openai-us-military-cybersecurity/