|
Hey guys - here's just some critical feedback from a fellow dev - here's my n of 1 perspective - of course this could be a very different perspective for e.g. large enterprise companies struggling with this. Feedback: It seems overly complicated. You lost me when you said i have to train models? Are you assuming that software developers want to train machine learning models to do something as simple as creating some test data? In reality - I reach for tools that make things easier for me, which includes not having to read a ton of documentation, download new external tools, and things that 'just work'. It is 100% easier for me to export a little production data to test on (and maybe sanitize), or to write a small script to generate a few users and those things I need to test. Plus - then I know exactly what I'm going to get. A lot of times, after I've done this once, it will work for a good while as well - if I do change the schema, I can add some additional data for that column, and go from there, or otherwise. For those companies who have 'messy' fixture data - is the tool the issue? My take is that the difficulty with maintaining the data could contribute to this issue, but is also more an issue of simply bad housekeeping - e.g. rushing and not tending the garden. While your system might handle this, your system also seems to require a different skillset (e.g. specific training/knowledge) than the standard QA developer might have. If I did use it, i'd prefer it to be much easier to use - if I could include a ruby gem, and incorporte it into the testing progress, e.g. an 'after' hook after migrating the db, that would be ideal. Then, I dont really need to know much. However, I would still be concerned about whether this is deterministically creating data or if its random? Good luck! |
> It is 100% easier for me to export a little production data to test on (and maybe sanitize), or to write a small script to generate a few users and those things I need to test.
In your case it may very well be. But when you are an organization with a schema which has 100+ tables and these tables have scattered sensitive information this can become a nightmare to manage. I've seen this first hand. Furthermore if you are trying to generate more than 'a little' data this can get more complex as you have to create factories and write a lot of code to make the whole thing coherent and tell a story. I think undertaking the added complexity of Synth is a trade-off one should consider depending on the sophistication of the testing data they require.
> If I did use it, i'd prefer it to be much easier to use
I think this misconception may be attributed to the fact that we use machine learning under the hood. We've spent a lot of time abstracting the developer away from this. In fact you can run the whole lifecycle with 1 line of code:
`synth model new --from-database <database-uri> --train --deploy`
> I would still be concerned about whether this is deterministically creating data or if its random?
At this point you can choose. You can either pick a seed with which the whole generation process starts (this may not be in production yet) or elect to randomly seed it.
Thanks for the great questions :)