Hacker News new | ask | show | jobs
by rekabis 127 days ago
At my last job they had an entire section of the product - hidden behind feature flags and enableable only in development - whose sole purpose was to generate dummy content for the different sections of the product. When I left, they were extending it with AI integration to have it generate more realistic data. The overall program - when in prod - contained massive amounts of PPI/PII, so we needed a way to generate massive amounts of realistic-looking dummy data to stress test it while in dev.
1 comments

This is interesting.

How much overhead did that add to your development workflow? I'm curious if building and maintaining that parallel demo infrastructure became its own project, or if it stayed lightweight.

Also, did you use this for investor demos specifically, or more for development/QA?

It had almost no workflow overhead. Remember - a data generator has essentially no overhead aside from any rules-based constraint on the data (how realistic it should be, what patterns it needs to have), and the format in which this data is stored (invariably a database). It had no interactions with any part of the main application, the only thing it touched was the database.

This let it be “simple” in terms of how it generated content, with it being “complicated” only in terms of what content it needed to create and its interconnections. Because patient profiles were simple to define, but were completely different than, say, the medications they were prescribed or the appointments that had been scheduled. Or the connections between appointments and prescriptions.

So yeah, generating data is simple, defining what data to be generated and in what patterns was a lot more difficult. Sometimes things that should be related could only be generated in isolation from each other because of how that part of the generation tooling was assembled.

This was almost 100% used by developers and QA. Outside demos had a special DB used by sales with much more consistent data, albeit much smaller. The generator was meant to create _large_ data sets, just not very _pretty_ data sets.