Hacker News new | ask | show | jobs
by BickNowstrom 3478 days ago

    """
    https://arxiv.org/abs/1508.06574
    "An encryption scheme is said to be homomorphic 
    if certain mathematical operations can be applied 
    directly to the cipher text in such a way that 
    decrypting the result renders the same answer as 
    applying the function to the original unencrypted 
    data."
    The function = GradientBoostingRegressor
    the cipher text = X_encrypted
    original data = X
    same answer = mean absolute error
    """
    import numpy as np
    from sklearn.metrics import mean_absolute_error
    from sklearn.ensemble import GradientBoostingRegressor

    # Replicability
    np.random.seed(0)

    # Create a data set with 1000 samples and 3 features
    X = np.random.randint(0, 60, (1000,3))

    # Create ground truth (the product of the three 
    # features - 100) / 11
    y = (np.prod(X, axis=1) - 100) / 11.
    
    # Encrypt y
    y_encrypted = y + 20

    # Encrypt X
    X_encrypted = X * -0.5

    # Init our model
    rgr = GradientBoostingRegressor(random_state=42)

    # Fit model on first 500 unencrypted features
    rgr.fit(X[:500], y[:500])

    # Predict the remaining 500 features
    preds = rgr.predict(X[500:])

    # Fit model on first 500 encrypted features
    rgr.fit(X_encrypted[:500], y[:500])

    # Predict the remaining encrypted features and decrypt
    preds_decrypted = rgr.predict(X_encrypted[500:]) - 20

    # Evaluate both functions
    print(mean_absolute_error(preds, y[500:]))
    print(mean_absolute_error(preds_decrypted, y[500:]))

    #>>> 323.09
    #>>> 323.72
1 comments

The encryption here is being done by "adding 20" / "multiplying by -0.5"?

Given this "encrypted" X , y dataset, I could easily find the unencrypted version... (even if I don't know 20 or -0.5, this still reveals so much of the structure that I don't believe it provides any real protection against anything except the most lazy attackers)

It is a toy example to show that a form of homomorphic encryption is possible, without going Fully Homomorphic Encryption.

And simple linear transforms on already anonymized features are not so easy to reverse engineer as you may think. Just try it on a few datasets from UCI.

Ah ok, sure. I wouldn't call something like a linear transform on anonymized features "encryption" (more like obfuscation?), but I guess it's good marketing in that it lets them associate with the "recent advances in [real] homomorphic encryption"
If you desire something more one-way, consider PCA, random projections, feature expansions (with something like Random Bits Regression), hashing, or the last hidden layer activations of your best in-house neural net. Then combine these approaches for good measure.

Agreed on the clever marketing, but at least they put their money (expensive dataset) where their mouth is (release it to reverse engineers the world over).

Fully Homomorphic Encryption challenges would be interesting, but it would disqualify our current state-of-the-art algorithms, and reduce the playing field to a handful of people who know how to write algo's that work with Fully Homomorphic Encryption (if any competitor at all is allowed to work on this, and not too busy working for the NSA).