| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sandGorgon 3278 days ago
	Will this be useful for machine learning in the same way as this ? https://medium.com/numerai/encrypted-data-for-efficient-mark...

3 comments

maffydub 3278 days ago

Open Mined (http://openmined.org/) are looking at this from the other angle - sharing the encrypted neural network with their users so that they can train it without sharing their data at all - the encrypted gradients are computed by the user and collated and decrypted by the company that wants to train the neural network.

(Not affiliated in any way, but went to a really interesting talk on this a few weeks ago as part of the London Machine Learning meetup group - https://www.meetup.com/London-Machine-Learning-Meetup/events...)

sandGorgon 3278 days ago

oh - its very similar to numerai. Pretty cool. I think this is their library - https://github.com/OpenMined/R-Homomorphic-Encryption-Packag... (they havent ported this to python, so are having to use R-bridge to use it in jupyter. I wonder why).

proofofstake 3278 days ago

Yes, but not to the same degree. Numerai uses structure-preserving encryption / neural encryption. This allows people to use any existing machine learning algorithm on the data. For fully homomorphic encryption you would need specialized algorithms. These are way more difficult to design. They also run slower.

lkowalcz 3278 days ago

Worth pointing out that Numerai actually doesn't use encryption in any standard sense (including structure-preserving encryption), but instead seems to be using some heuristic method of obfuscating their data.

Their (closed-source) method of obfuscating their data apparently does have the property that it preserves the structure of the data, but calling it "structure-preserving encryption" is misleading imo since it risks confusing it with standard notions of encryption and structure-preserving encryption which have much stronger security guarantees.

(Their marketing seems to encourage this conflation by, for example, citing academic advances in standard notions of homomorphic encryption and SPE and implying that these advances have enabled Numerai's technology)

https://medium.com/numerai/encrypted-data-for-efficient-mark...

proofofstake 3278 days ago

They switched from structure-preserving encryption to neural encryption (using a neural net's layer activations).

> Just a few months ago this package was released by Louis Aslett at Oxford http://www.louisaslett.com/HomomorphicEncryption/. Louis helped me use his package to do Fan and Vercauteren homomorphic encryption on my dataset. Because the ciphertexts are polynomials it's not too easy for an average data scientist to use the data. That's why I came up with more chill ways of encrypting Numerai's data that the article mentions like order-preserving encryption. There's a security vs easy of use trade off, for sure. But homomorphic encryption is a real thing.

https://www.reddit.com/r/MachineLearning/comments/3zvuge/enc...

Louis Aslett authored https://arxiv.org/abs/1508.06574

lkowalcz 3278 days ago

Thanks for that link! I was never able to find any details from someone who works for Numerai (or claims to at least)

I still don't think it's fair to market their method as comparable to Aslett's scheme or "standard" notions of homormorphic/order preserving encryption, no matter how "chill" they are :)

proofofstake 3278 days ago

Do you think neural encryption is closer to encryption? GAN-style: One network encrypts while preserving structure, another network tries to reverse engineer to the original features.

Edit: No specific sources for what Numerai is using, but in general: https://arxiv.org/abs/1610.06918 "Learning to Protect Communications with Adversarial Neural Cryptography".

Edit2: Yes, in general. I would say "yes, this is a valid form of encryption". But I do agree that their marketing was perhaps a bit too optimistic. I have no problem calling it "obfuscation" either (I just think their method of "obfuscation" is way more advanced than removing headers and normalizing within 0-1).

lkowalcz 3278 days ago

I've never heard of neural encryption before, do you have a good source for me?

From the Wikipedia page for "Neural cryptography", it seems like there's some success in using NN's for cryptanalysis, but not for constructions...

Edit: Do you mean the Google GAN experiment? (https://arxiv.org/pdf/1610.06918v1.pdf) Ahh ok, well at least for this there looks like an attempt at defining a security model (security against some other NN). I don't really believe the security model is realistic (how do we know NN's are really that effective as adversaries?), but at least there is a model, so calling that "encryption" sits somewhat better with me. I'm pretty sure this is not what's being used by Numerai since it seems like it would not result in ciphertexts with the structure necessary to perform ML operations on.

Edit2: Maybe you're right and it is more advanced. In any case, as a crypto nerd I wish they would disclose what they are actually doing / the rationale instead of tantalizingly suggesting that they have made (what would be) a breakthrough in a practical use case of advanced encryption schemes, but not saying how.

nicpoulos 3277 days ago

This is a great and really important point. But shouldn't warranting the "encryption" label come down to the security guarantee of the algorithm in question, in the use cases it's meant for? I don't see why non-traditional data obfuscation algorithms can't be equally valid methods of "encryption" just because they are architecturally different. After all there are plenty of traditional encryption approaches that have been deprecated and deemed insufficiently insecure. So it'd be a bigger problem if the definition of encryption became so narrow that practitioners assume security guarantees based on use of the term alone.

IMO the key is that new encryption methods demonstrate levels of security in their intended use cases that users deem sufficiently strong. I agree that all closed-source methods leave room for misleading statements (and I won't speak to Numerai as I don't have any additional insight on their approach). But any algorithm can and should be benchmarked--generally and vs. common standards--to avoid confusion. Serious commercial & public sector users will be quick to ensure so, especially in lieu of massive social proof.

My thought here is that if we draw the line on what warrants "encryption" by saying it's got to be a traditional key-based system, or sufficiently similar to existing approaches, we risk stifling innovation in the space by denying new entrants an industry-standard term that buyers are trained to seek. Non-traditional doesn't necessarily mean non-secure. Would love to hear your thoughts on this.

lkowalcz 3276 days ago

I think it's appropriate to restrict usage of the word "encryption" to methods that come with some justification for their security.

I don't think that doing this stifles innovation (In fact, I think requiring crypto innovations to have security justifications is probably better overall for innovation in the field)

nicpoulos 3273 days ago

Your specific statement was valid and good-natured, so I didn't mean to attack you here. All I'm arguing against is the adoption of a legal or de-facto definition of "encryption" based on architectural similarity to currently popular methods, rather than actual evaluation in situ. Most all practical applications of crypto trade security for functionality to some extent, and requirements for one or the other vary based on the use case. Future security gains, moreover, won't likely all be made on traditional grounds (e.g. increased susceptibility to brute force hacks). That's why I'm hesitant to support statements like "X company is misleading us by using the term 'encryption' because its closed-source approach doesn't appear to adhere to conventional notions on cursory inspection."

Much practical innovation comes from closed-source applications whose peer review comes in the form of commercial lab tests. Overly ossified technology standards & labels often force CIOs / CTOs / CISOs to build artificial barriers into their corporate procurement processes for optical reasons. In addition to engendering "check-the-box" complacency, these barriers absolutely stifle startup-driven innovation.

sandGorgon 3278 days ago

Interesting. Any insight into what specific algorithms these are ? I would love to play around with my own data this way.

It wouldn't be a simple hash or something, would it ?

lkowalcz 3278 days ago

I suspect they might just be anonymizing their features (removing any labels), then normalizing them to [0,1].

amenghra 3278 days ago

This is very interesting from an academic/theory point of view.

There currently aren't a lot practical use cases where we can afford a performance loss of ~100,000,000x (your homomorphic crypto algorithm is going to run on the order of ~Hz on a ~Ghz CPU).

tgtweak 3278 days ago

There are applications for it, despite the speed penalty.

dsacco 3278 days ago

What are those applications?

tgtweak 3278 days ago

Any application where you need an untrusted third party to do operations on encrypted data, without them getting access to the data itself.

dsacco 3278 days ago

Right but that's nearly tautological...I meant specific applications that can tolerate the massive performance sacrifice.