|
|
|
|
|
by carlps
2125 days ago
|
|
I'm curious how the model handles text data. Does it use the actual input text from the source db to generate new synthetic data? If I have a column of a bunch of sensitive text that I need sanitized, how will that appear in the output? What is the risk of leaking something sensitive? |
|
For now text data will be marked as `categorical` or `text`. When you have sensitive data you want to use `text` which will provide a lorem-ipsum type generator.
If the model has classified that column with the semantic type `text`, no information from the column should be leaked :)