| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zb3 1093 days ago
	So now instead of sending the data to OpenAI we send it to Cape? I know that you "promise to keep it secure", but I can only trust you, right? Something like this should IMO be done on-premise

4 comments

garciasn 1093 days ago

Yes, this has been my #1 issue with all the VC-backed startup dollars flowing lately. They are all 100% reliant on OpenAI and are just shuttling private information and pretending OpenAI's terms are good enough protection.

So far, most we have spoken to are literally SHOCKED that we require SOC3 (one company even told me they'd never even heard of SOC3) and everything needs to be hashed before it goes out and be mapped on our end back to actual. They think we're being too cautious and are really trying to get to sale without understanding that it's literally NOT something we can do and NO ONE else should be doing it either.

link

gavinuhma 1093 days ago

Good points. I think the rabbit hole of OpenAI sub-processors is not commonly understood.

The humans at TaskUS are moderating prompts, and then you have Azure, CloudFlare, and Snowflake as sub-processors, each with their own list of sub-processors and on and on.

https://platform.openai.com/subprocessors

Data breaches can happen, so any data that you throw over the wall to OpenAI you must be willing to accept that it could become public.

link

gavinuhma 1093 days ago

Yep! The more you can do locally the better. An entirely local LLM is the best for data privacy and security. Any time data leaves it poses some risk.

The de-identification itself requires a complex language model, which has its own complexity and costs to operate. At Cape we're going as far as we can to offer a secure API that's self-serve and easy to use to make these feature accessible to developers, but it does require trust in Cape and the underlying AWS Nitro Enclaves that we use. Client-side attestation is a security feature that can help provide cryptographic verification to the client of the secure enclave. But local is always better when possible!

link

dbesemer 1093 days ago

I will add that running your own private LLM is complicated and costly; and that private LLM (at this point) will not be as capable as GPT-4. So while running a private LLM will certainly be the right solution for some, Cape's offering makes improved privacy available to many.

link

survirtual 1093 days ago

Right...

I want less parties involved with secure data, not more. This should be an on-prem solution with no external network access and no direct calls to OpenAI. A call is made to this service to obfuscate, then another call to OpenAI, all managed by a coordinating mechanism that is opensource / trusted.

Better yet, maybe LLMs should be required to have weights released considering they are trained on the collective of human knowledge. Seems strange to use a significant sum of human knowledge that is publicly available then deny everyone access to the weights.

link

gavinuhma 1093 days ago

Entirely local and 0 sub-processors is the ideal! I hope we are trending that way as an industry

link

stevelacy 1093 days ago

They are using AWS Nitro instances for their enclaves. These can absolutely be run on-prem with self-hosted licensed software to perform the computational redacting.

link