Hacker News new | ask | show | jobs
by woyten 754 days ago
Hello, community!

I am Vadim Voitenko, a database engineer and the developer of Greenmask (https://github.com/GreenmaskIO/greenmask), an open-source project for data anonymization in PostgreSQL. During my R&D work as a database engineer, I realized that database anonymization is not widely discussed or popular, despite many companies encountering issues in this area. Therefore, I decided to contribute and write a series of articles about data anonymization and the common problems people face. As a database engineer and enthusiast, I want to share the first article (https://hackernoon.com/database-anonymization-the-basics), which I hope lays the foundation for understanding the problem and provides examples.

Taking this opportunity, I would like to highlight several important features of Greenmask project that might be interesting for you:

* Validation: Powerful validation commands allow you to check transformation differences, constraint violations, and schema changes.

* Highly Customizable Transformation: For instance, customizable transformations such as random email generation (https://greenmask.io/v0.2beta1/built_in_transformers/standar...).

* Database Type Safety: Integrated with the pgx driver, ensuring accurate value encoding/decoding.

* Dynamic Parameters: Allows resolution of functional dependencies between columns (https://greenmask.io/v0.2beta1/built_in_transformers/dynamic...).

* Deterministic and Random Engines: Offers both deterministic and random transformation engines (https://greenmask.io/v0.2beta1/built_in_transformers/transfo...).

* Single Utility: Easy to set up and run.

* Schema Dumping and Restoration: Delegates these tasks to PostgreSQL's native utilities, as they are the most reliable.

* Backward Compatibility: Dumps created by Greenmask can be successfully restored by pg_restore.

* Large Objects Support: Handles the dumping and restoration of LargeObjects (https://www.postgresql.org/docs/current/largeobjects.html).

* Control Schema Changes: Checks your schema changes against the previous snapshot, prints the differences, and returns a non-zero exit code if changes are detected. This feature can be helpful in your CI/CD pipelines.

There is definitely a lot of work ahead. Have a look at the planned features in our Roadmap (https://github.com/orgs/GreenmaskIO/projects/6).

We have prepared a playground sandbox for experiments and for evaluating the Greenmask utility. You can access it here: (https://greenmask.io/v0.2beta1/).

If you want to run a Greenmask playground for the beta version execute:

git checkout tags/v0.2.0b1 -b v0.2.0b1

docker-compose run greenmask-from-source

I would appreciate feedback.