Hacker News new | ask | show | jobs
by ivalm 2309 days ago
I agree that this is very tricky. I think the most interesting synthetic healthcare data generation I saw was using causal inference (where SMEs can bake in a bunch of expert knowledge during skeleton construction) and then generated data by getting the weights on the edges from a smaller dataset. At the same time, it is very hard to ensure that you synthetic dataset actually reflects real world. On one hand SME knowledge might give extra oomph to synthetic data generation (as this knowledge is equivalent to some highly abstracted training) but also if the "expert knowledge" is wrong then it's a recipe for disaster.