|
|
|
|
|
by bradneuberg
3393 days ago
|
|
One example of synthetic data generation was for our OCR project. We took a corpi of word choices (Project Gutenberg, modern books, the UPC database for receipts, etc.), took several thousand fonts, and combined it with geometric transformations that mimic distortions like shadows, creases, etc. to bootstrap millions of fake OCR like scannable documents. We aren't using GANs yet, but are definitely keeping an eye on them. Work like InfoGANs which has the GAN learn a ground-truth like label are very promising, but GANs don't yet work at the image sizes necessary to really make this promising. I do think in the next year or two we will see these problems solved and GANs will become an integral part of synthetic data generation. |
|