|
|
|
|
|
by leecarraher
1553 days ago
|
|
for synthetic data generation, what methods are they using to sample data from the distribution? What assumptions about the distribution are being made? Does it model correlations between sample attributes that could adversely effect some ML methods (multi-colinearity can cause problems). |
|
This being said, the goal of Sarus is to enable analysis on the original data with privacy guarantee on the result (synthetic data is merely used as a tool and a fallback when there is no better solution) so you can write a statistical test to detect multicollinearity and run it on the original data within Sarus.