Hacker News new | ask | show | jobs
by n4atki 849 days ago
SDV does offer a CTGANSynthesizer, which is a GAN-based generative approach. Could be worth a try, though CTGAN specifically may require customization (tweaking some parameters).

That being said, synthetic data definitely isn't a magic pill for all use cases. I have found it particularly useful for things like QA, performance testing, etc. -- where alternative tools for test data creation aren't sufficient.

For the use case of imbalanced classification: May be worth asking what is it about existing solutions (SMOTE) that doesn't work well?