| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by openquery 2133 days ago

Thanks for the feedback. This is exactly what we're looking for.

> It is 100% easier for me to export a little production data to test on (and maybe sanitize), or to write a small script to generate a few users and those things I need to test.

In your case it may very well be. But when you are an organization with a schema which has 100+ tables and these tables have scattered sensitive information this can become a nightmare to manage. I've seen this first hand. Furthermore if you are trying to generate more than 'a little' data this can get more complex as you have to create factories and write a lot of code to make the whole thing coherent and tell a story. I think undertaking the added complexity of Synth is a trade-off one should consider depending on the sophistication of the testing data they require.

> If I did use it, i'd prefer it to be much easier to use

I think this misconception may be attributed to the fact that we use machine learning under the hood. We've spent a lot of time abstracting the developer away from this. In fact you can run the whole lifecycle with 1 line of code:

`synth model new --from-database <database-uri> --train --deploy`

> I would still be concerned about whether this is deterministically creating data or if its random?

At this point you can choose. You can either pick a seed with which the whole generation process starts (this may not be in production yet) or elect to randomly seed it.

Thanks for the great questions :)