Hacker News new | ask | show | jobs
by ljm 171 days ago
Reminds me a bit of Snaplet before it embarked on its incredible journey to get acquired by Supabase and shut down.

I like the concept but the painpoint has never been around creating realistic looking emails and such like, but creating data that is realistic in terms of the business domain and in terms of volume.

2 comments

Appreciate the Snaplet comparison, they were doing good work. You're right that realistic looking strings are the easy part. We're focused on relational integrity first (FKs, constraints, realistic cardinality), but business domain logic is the next layer. What kinds of rules would be useful for you? Things like weighted distributions, time-based patterns, conditional relationships?
The realistic cardinality is actually a good start (the problem with things like using Faker for DB seeds being that everything is entirely too random).

If one were be able to use metrics as source then, depending on the quality of the metrics, it might be possible to distribute data in a manner similar to what's observed in production? You know, some users that are far more active than others, for example. Considering a major issue with testing is that you can't accurately benchmark changes or migrations based on a staging environment that is 1% the size of your prod one, that would be a huge win I think even if the data is, for the most part, nonsensical. As long as referential integrity is intact the specifics matter less.

Domain specific stuff is harder to describe I think. For example, in my setup I'd want seeds of valid train journeys over multiple legs. There's a lot of detail in that where the shortcut is basically to try and source it from prod in some way.

This is useful. What if you ran a CLI locally that extracts just the statistical profile from prod cardinality, relationship ratios, etc. and uploaded that? We'd never touch your database, you just hand us the metrics and we match the shape.
We do exactly that in one of our products. It's called data profiling.
I'd be willing to try that out :) a CLI would be great, even as a sandbox tool
Really appreciate the input. I'll make sure to give you early access once we implement this, I'll keep you posted.
Hey! Snaplet founder here. Want to clarify that it was not acquired by Supabase; I shutdown the startup and found roles for some of the team at Supabase.

The code remains:

- https://github.com/supabase-community/seed - https://github.com/supabase-community/copycat - https://github.com/supabase-community/snapshot

This looks like a great project, wishing them all the best on the journey.

Thanks!! means a lot coming from you. Best of luck at Supabase.
Thanks, but I am not at Supabase! I ended up going back to building RedwoodJS and took over the project, and now have a consultancy.