|
|
|
|
|
by Shamar
1751 days ago
|
|
Beyond the global output and error of each sample from the last epoch, the log also includes the weight update of one single (fully connected) node for each layer. During the compilation phase, the training dataset is projected on a complex vector space that is constituted by both the "model" of the "neural network" and these logs. It's just like projecting a shadow over a bidimensional surface: if you discard the data pertaining to one dimension you have no hope to guess what projected it: you need both dimensions. The logs that are preserved in the compilation process is the part of the vector space that is usually discarded during the "training". But discarding the "model" would have exactly the same effect: you cannot get back the source dataset from those logs alone. That's why this does not "smuggle the training dataset back". Indeed the fact that the source dataset is obtainable from the couple "these logs" + "final model", but neither from "these logs" alone nor by the final model alone, proves that a substantial portion of the source dataset is always embedded in the "model", that becomes a derivative work of the sources. |
|
Basically the argument starts with a claim (you can reconstruct the training set of model X from its weights alone) and then shows something totally different. Of course you can reconstruct from the gradient updates plus the weights—that's not interesting, nor does it support the claim.