Hacker News new | ask | show | jobs
Show HN: VeilStream – prod-like data without the PII (app.veilstream.com)
22 points by joram87 361 days ago
# TL;DR We built VeilStream, a drop-in, read-only PostgreSQL proxy that strips, masks, or anonymizes sensitive values as queries stream through. In less than two minutes, you can put a proxy in front of a PostgreSQL database, whether hosted on your laptop, Neon, Supabase, or a cloud provider, and the user is able to start configuring filter rules.

The use cases we're trying to solve are:

- Production-like data in development environments

- Improve incident handling by masking all data that is not relevant

- Share a subset of your data

- Protecting data being shipped into a data lake

- Safe data to expose in internal tooling, metrics, or BI dashboards

- Empower non-technical staff to vibe-code against sanitized data

# How it fits in your stack

- Role based policies: define masking rules in our web dashboard

- The proxy picks up the configuration and starts applying rules automatically.

## You host it

- it's a docker container, two environment variables: an api key, and the database URI connection

## We host it

- Drop-in proxy: no code changes. Point your connection string at a new endpoint, that's it.

# How it works (and how fast it is)

Restructuring the query AST based on the config. AST rewrites depend on the text/structure of the query, not on how many rows the database eventually returns, so they are effectively O(1) with respect to result size.

# Status & feedback wanted

VeilStream is GA, but billing isn’t switched on yet so it's currently free at all tiers. We’d love your thoughts on:

- throughput / latency in real workloads

- Filter rules & DevX

- weird edge-case queries (pg_dump, logical replication, etc.)

I’ll be around all day to answer questions and dig into issues.

# tagline

Ship features with data you can trust and privacy you don't have to worry about.

5 comments

This looks awesome! A couple questions though:

How do you handle connection pooling? Does this interfere with pgbouncer or similar tools?

Also, does this work with all PostgreSQL extensions (PostGIS, timescaledb, etc.)?

Good questions.

We do not do connection pooling yet. currently it's a fresh connection per query (which adds a bit of latency). We're intending to add basic connection pooling shortly after launch. That said, if you put it in-front of pgbouncer, that would work well.

PostGIS and other extensions are on the radar, but currently are not supported. The proxy works with the extensions, but can't mask the data yet. If we get requests for specific extensions to be fully supported, we'll implement (same with extra masking data types). I look forward to the GIS data implementation, as I've met one of the postGIS contributors and have discussed several of those masking complexities.

Very cool. Is there a way to only mask/obfuscate some of the data? i.e. Mask the email in rows where the country column is country X?
Oh yes! We made that possible through conditionals. We default to unconditional modification, but if you toggle the conditional option, you can provide a list of conditions which, if they all pass, trigger the modifications.

A future improvement to that: currently the conditions are all ANDed together, I'd like to support more types of boolean logic in the future. :)

This is very cool to see. I love that it is built on Postgres. Have you tried it paired with something like Postgres.AI's DBLab engine? It seems like that could be really powerful and I don't see any reason why it wouldn't work off the bat.
We've not tried it with dblab engine. that would be an excellent combo (from my quick reading up on it). I'll add to my todo list to experiment with the pairing. It does look like some overlap in functionality, but mostly they are symbiotic.
Super promising. What data types can Veilstream handle? Like can I mask nested jsonb, uuids, IP addresses, arrays? Would be wild if adding new filters was fast enough to support weird internal schemas or bespoke pii.
- jsonb : kinda, we do static json replacement, with more complex rules on the horizon, where you could replace some regex-like path with a random func.

- uuids: no, but I should. adding to my list :)

- ip addresses: yes ip4 and ip6, but I want to go further and let you configure the replacement ips to be within specified cidr blocks

- arrays: again, not yet. Do you mind if I ask the use case? Arrays are commonly done as single rows and foreign keys/look ups, which we can do.

We've internally got the path for adding new filter types (dashboard configuration, api layer storage, and proxy rule implementation) pretty optimized. it takes us a day or two to add simple requested filters. longer for more complex ones.

Beauty. For arrays was thinking of semi-structured stuff in audit logs or embedded tags, but yeah, can probably reshape upstream. Awesome to see the path to custom filters is already so streamlined.
re: streamlining the custom filters

we were considering allowing the user to inject stored procedures themselves, and then use those, but currently, we're opting to implemented them ourselves, so we have better control over the user experience. In the future, for very custom stored procedures, I think we may allow the custom path.

Is there GitHub link or something ? Right now as I follow the link all it wants is for me to "login" before showing anything at all
ah yeah, I linked the web app, the landing page is here: https://www.veilstream.com/ Currently we don't have much publicly on the GitHub page.