Hacker News new | ask | show | jobs
Show HN: A real world streaming data generator in Python (github.com)
1 points by ashishbagri 409 days ago
I've built GlassGen to solve the common problem of generating real time synthetic data for testing, demos, and ML datasets. While Faker is great for individual data points, GlassGen adds:

- Configurable data publishing (CSV, Kafka, Webhooks)

- Precise rate control (records/second)

- Controlled data duplication

- Extensible architecture for custom generators and sinks

Key features:

- Built on top of Faker for reliable data generation

- Simple JSON/YAML configuration

- Support for complex data relationships

- Real-time data streaming to Kafka

- Custom sink implementations

GitHub: https://github.com/glassflow/glassgen

Docs: https://glassgen.glassflow.dev/

Would love feedback from the community, especially on:

1. Additional sink types that would be useful

2. Performance optimization opportunities

3. Ideas for handling more complex data relationships