Hacker News new | ask | show | jobs
by proofofstake 3231 days ago
Yes, but not to the same degree. Numerai uses structure-preserving encryption / neural encryption. This allows people to use any existing machine learning algorithm on the data. For fully homomorphic encryption you would need specialized algorithms. These are way more difficult to design. They also run slower.
1 comments

Worth pointing out that Numerai actually doesn't use encryption in any standard sense (including structure-preserving encryption), but instead seems to be using some heuristic method of obfuscating their data.

Their (closed-source) method of obfuscating their data apparently does have the property that it preserves the structure of the data, but calling it "structure-preserving encryption" is misleading imo since it risks confusing it with standard notions of encryption and structure-preserving encryption which have much stronger security guarantees.

(Their marketing seems to encourage this conflation by, for example, citing academic advances in standard notions of homomorphic encryption and SPE and implying that these advances have enabled Numerai's technology)

https://medium.com/numerai/encrypted-data-for-efficient-mark...

They switched from structure-preserving encryption to neural encryption (using a neural net's layer activations).

> Just a few months ago this package was released by Louis Aslett at Oxford http://www.louisaslett.com/HomomorphicEncryption/. Louis helped me use his package to do Fan and Vercauteren homomorphic encryption on my dataset. Because the ciphertexts are polynomials it's not too easy for an average data scientist to use the data. That's why I came up with more chill ways of encrypting Numerai's data that the article mentions like order-preserving encryption. There's a security vs easy of use trade off, for sure. But homomorphic encryption is a real thing.

https://www.reddit.com/r/MachineLearning/comments/3zvuge/enc...

Louis Aslett authored https://arxiv.org/abs/1508.06574

Thanks for that link! I was never able to find any details from someone who works for Numerai (or claims to at least)

I still don't think it's fair to market their method as comparable to Aslett's scheme or "standard" notions of homormorphic/order preserving encryption, no matter how "chill" they are :)

Do you think neural encryption is closer to encryption? GAN-style: One network encrypts while preserving structure, another network tries to reverse engineer to the original features.

Edit: No specific sources for what Numerai is using, but in general: https://arxiv.org/abs/1610.06918 "Learning to Protect Communications with Adversarial Neural Cryptography".

Edit2: Yes, in general. I would say "yes, this is a valid form of encryption". But I do agree that their marketing was perhaps a bit too optimistic. I have no problem calling it "obfuscation" either (I just think their method of "obfuscation" is way more advanced than removing headers and normalizing within 0-1).

I've never heard of neural encryption before, do you have a good source for me?

From the Wikipedia page for "Neural cryptography", it seems like there's some success in using NN's for cryptanalysis, but not for constructions...

Edit: Do you mean the Google GAN experiment? (https://arxiv.org/pdf/1610.06918v1.pdf) Ahh ok, well at least for this there looks like an attempt at defining a security model (security against some other NN). I don't really believe the security model is realistic (how do we know NN's are really that effective as adversaries?), but at least there is a model, so calling that "encryption" sits somewhat better with me. I'm pretty sure this is not what's being used by Numerai since it seems like it would not result in ciphertexts with the structure necessary to perform ML operations on.

Edit2: Maybe you're right and it is more advanced. In any case, as a crypto nerd I wish they would disclose what they are actually doing / the rationale instead of tantalizingly suggesting that they have made (what would be) a breakthrough in a practical use case of advanced encryption schemes, but not saying how.

That particular paper got a LOT of flames when it was posted to /r/MachineLearning

https://www.reddit.com/r/MachineLearning/comments/59v9ua/r_1...

This is a great and really important point. But shouldn't warranting the "encryption" label come down to the security guarantee of the algorithm in question, in the use cases it's meant for? I don't see why non-traditional data obfuscation algorithms can't be equally valid methods of "encryption" just because they are architecturally different. After all there are plenty of traditional encryption approaches that have been deprecated and deemed insufficiently insecure. So it'd be a bigger problem if the definition of encryption became so narrow that practitioners assume security guarantees based on use of the term alone.

IMO the key is that new encryption methods demonstrate levels of security in their intended use cases that users deem sufficiently strong. I agree that all closed-source methods leave room for misleading statements (and I won't speak to Numerai as I don't have any additional insight on their approach). But any algorithm can and should be benchmarked--generally and vs. common standards--to avoid confusion. Serious commercial & public sector users will be quick to ensure so, especially in lieu of massive social proof.

My thought here is that if we draw the line on what warrants "encryption" by saying it's got to be a traditional key-based system, or sufficiently similar to existing approaches, we risk stifling innovation in the space by denying new entrants an industry-standard term that buyers are trained to seek. Non-traditional doesn't necessarily mean non-secure. Would love to hear your thoughts on this.

I think it's appropriate to restrict usage of the word "encryption" to methods that come with some justification for their security.

I don't think that doing this stifles innovation (In fact, I think requiring crypto innovations to have security justifications is probably better overall for innovation in the field)

Your specific statement was valid and good-natured, so I didn't mean to attack you here. All I'm arguing against is the adoption of a legal or de-facto definition of "encryption" based on architectural similarity to currently popular methods, rather than actual evaluation in situ. Most all practical applications of crypto trade security for functionality to some extent, and requirements for one or the other vary based on the use case. Future security gains, moreover, won't likely all be made on traditional grounds (e.g. increased susceptibility to brute force hacks). That's why I'm hesitant to support statements like "X company is misleading us by using the term 'encryption' because its closed-source approach doesn't appear to adhere to conventional notions on cursory inspection."

Much practical innovation comes from closed-source applications whose peer review comes in the form of commercial lab tests. Overly ossified technology standards & labels often force CIOs / CTOs / CISOs to build artificial barriers into their corporate procurement processes for optical reasons. In addition to engendering "check-the-box" complacency, these barriers absolutely stifle startup-driven innovation.

Interesting. Any insight into what specific algorithms these are ? I would love to play around with my own data this way.

It wouldn't be a simple hash or something, would it ?

I suspect they might just be anonymizing their features (removing any labels), then normalizing them to [0,1].