Hacker News new | ask | show | jobs
by carlps 2125 days ago
I'm curious how the model handles text data. Does it use the actual input text from the source db to generate new synthetic data? If I have a column of a bunch of sensitive text that I need sanitized, how will that appear in the output? What is the risk of leaking something sensitive?
1 comments

Thanks for the question!

For now text data will be marked as `categorical` or `text`. When you have sensitive data you want to use `text` which will provide a lorem-ipsum type generator.

If the model has classified that column with the semantic type `text`, no information from the column should be leaked :)