Hacker News new | ask | show | jobs
by mark_l_watson 550 days ago
It is likely Apple can get additional data by creating synthetic data for user interactions.

About 7 years ago I trained GAN models to generate synthetic data, and it worked so well. The state of the art has increased a lot in 7 years, so Apple will be fine.

1 comments

For a while there I would have been in agreeance with you, but the thought that models can be trained purely on synthetic data has shown to be wrong on multiple levels. Synthetic data needs to be reviewed by individuals to ensure data quality, significantly reducing the speed at which an organization can adopt training data. Reasonable engineers would suggest that the answer to this is to have other language models review the synthetic data, but we have seen that this is what leads to model collapse due to compounding issues around hallucinations.

At best Synthetic data is a "slow follow" for training a model due to the need for human review, but a competitive model, it does not make.