Hacker News new | ask | show | jobs
by bradneuberg 3393 days ago
One example of synthetic data generation was for our OCR project. We took a corpi of word choices (Project Gutenberg, modern books, the UPC database for receipts, etc.), took several thousand fonts, and combined it with geometric transformations that mimic distortions like shadows, creases, etc. to bootstrap millions of fake OCR like scannable documents.

We aren't using GANs yet, but are definitely keeping an eye on them. Work like InfoGANs which has the GAN learn a ground-truth like label are very promising, but GANs don't yet work at the image sizes necessary to really make this promising. I do think in the next year or two we will see these problems solved and GANs will become an integral part of synthetic data generation.

1 comments

Ooh, neat! I've used GANs to generate synthetic temporal sequence data for training electrophysiologic (i.e., EEG, EMG) signal decoders. In fact, I wrote up the section of my dissertation on this topic today! In my experience it worked quite a bit better than other generative techniques (I've used convolutional variational autoencoders in the past for this and had so-so results). Looking forward to seeing what you guys do with this!