Hacker News new | ask | show | jobs
by n4atki 849 days ago
Curious what a good end-to-end solution looks like for you? Is it more about ease-of-use (import/export with minimal effort) or is there a privacy layer that's missing?

I see it in 4 steps: 1. Connect to a source db to import your data 2. Train a Gen AI using the source data 3. Use it create synthetic data 3. Export synthetic data into a new db

The SDV team is working on business solutions to cover the full use case. You can use the public SDV to validate steps 2 and 3.

1 comments

its not necessarily about the privacy layer per se. the workflow i was ideating over is as follows:

1. spin up a production-equivalent database (eg: mysql container instead of prod RDS)

2. point a process/binary (maybe a simple container) to:

-- source db (RDS)

-- sink db (mysql container)

-- transformation function (that may use gen AI, etc) to seed sink db with synthetic/anonymized data [there may be some parallel process to enable testing of this transformation function]

3. profit (use this for dev etc)

Key over here would be speed in step (2) if the entire pipeline were to run end-to-end on-demand. do you have some examples of using SDV to achieve this? highly possible that there's already something in the docs that I have missed