|
|
|
|
|
by BickNowstrom
3478 days ago
|
|
"""
https://arxiv.org/abs/1508.06574
"An encryption scheme is said to be homomorphic
if certain mathematical operations can be applied
directly to the cipher text in such a way that
decrypting the result renders the same answer as
applying the function to the original unencrypted
data."
The function = GradientBoostingRegressor
the cipher text = X_encrypted
original data = X
same answer = mean absolute error
"""
import numpy as np
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import GradientBoostingRegressor
# Replicability
np.random.seed(0)
# Create a data set with 1000 samples and 3 features
X = np.random.randint(0, 60, (1000,3))
# Create ground truth (the product of the three
# features - 100) / 11
y = (np.prod(X, axis=1) - 100) / 11.
# Encrypt y
y_encrypted = y + 20
# Encrypt X
X_encrypted = X * -0.5
# Init our model
rgr = GradientBoostingRegressor(random_state=42)
# Fit model on first 500 unencrypted features
rgr.fit(X[:500], y[:500])
# Predict the remaining 500 features
preds = rgr.predict(X[500:])
# Fit model on first 500 encrypted features
rgr.fit(X_encrypted[:500], y[:500])
# Predict the remaining encrypted features and decrypt
preds_decrypted = rgr.predict(X_encrypted[500:]) - 20
# Evaluate both functions
print(mean_absolute_error(preds, y[500:]))
print(mean_absolute_error(preds_decrypted, y[500:]))
#>>> 323.09
#>>> 323.72
|
|
Given this "encrypted" X , y dataset, I could easily find the unencrypted version... (even if I don't know 20 or -0.5, this still reveals so much of the structure that I don't believe it provides any real protection against anything except the most lazy attackers)