| Much needed. Great opportunity. Happy hunting. > ...the API provides synthetic data samples with the same schema and statistical distribution by default Neat. Repurposing test data generators. I like it. > Our model is a software license to run on our clients’ cloud. Just to confirm my understanding: Sarus never sees the client's data? Cool. -- I'm fine with differential privacy. I haven't read those most recent papers, so I'm a little out of date. That said... Encrypting data at rest at the field level is an important missing piece from the future perfect privacy stack. Just like how proper password vaults work. Salt, hash, encrypt. Never store the actual password. The book Translucent Databases details clever examples of this strategy for misc use cases. Never store PII as plaintext. Translucent databases and differential privacy are orthogonal. I have no ideas on how to productize (or SaaS-atize) translucent strategies. |
And indeed, data encryption and privacy-preserving analysis (what Sarus does) is quite orthogonal. You may combine them for some use cases (so that data are protected on the machine, but also cannot be re-identified from queries) For example, on our infra (used only for demo, not for clients), by default all data are encrypted at rest by the cloud provider. You could even try to add FHE (Fully homomorphic encryption), but that's quite complex (and probably wouldn't support many type of data analysis).