| HN Mirror

I like this much better than synthetic data augmentation actually. I think synthetic augmentation, like with GANs is actually a failed concept.

There has long been theoretical limits around how much you can gain by ensembling with a model of known limitations, and this is all that synthetic training data is at root.

You can’t “make up” training data that allows you to escape the ceiling of performance implied by whatever generator process you use for the synthetic data, no differently than you can’t learning a better regression just by bootstrapping a large sample of data from your existing training set.

Algorithmic synthetic data is a big type of fool’s gold.