Hacker News new | ask | show | jobs
by mitchellpkt 1201 days ago
It warrants professionally-cautious skepticism because the synthetic data generation process often involves assumptions, approximations, and spurious or omitted dynamics. These limitations can then impact the model, as it is learning on dynamics present in the synthetic data but not real life.

Not saying there is no place for synthetic data. Just needs to be in situations where the salient dynamics are well understood such that they are realistically reproduced in the generated data.