| Reading the whitepaper, the inference provider still has the ability to access the prompt and response plaintext. This scheme does seem to guarantee that plaintext cannot be read for all other parties (e.g. the API router), and that the client's identity is hidden and cannot be associated with their request. Perhaps the precise privacy guarantees and allowances should be summarized in the readme. With that in mind, does this scheme offer any advantage over the much simpler setup of a user sending an inference request: - directly to an inference provider (no API router middleman) - that accepts anonymous crypto payments (I believe such things exist) - using a VPN to mask their IP? |
I'm not sure I understand what you mean by inference provider here? The inference workload is not shipped off the compute node once it's been decrypted to e.g. OpenAI, it's running directly on the compute machine on open source models loaded there. Those machines are cryptographically attesting to the software they are running. Proving, ultimately, that there is no software that is logging sensitive info off the machine, and the machine is locked down, no SSH access.
This is how Apple's PCC does it as well, clients of the system will not even send requests to compute nodes that aren't making these promises, and you can audit the code running on those compute machines to check that they aren't doing anything nefarious.
The privacy guarantee we are making here is that no one, not even people operating the inference hardware, can see your prompts.