Hacker News new | ask | show | jobs
by robert_tweed 4312 days ago
Generally the hardest thing with version control on a database (for an evolving codebase) is separating unrelated changes - such as schema changes vs content updates - and branching and merging those changes in sync with the code dependencies. Another issue is non-destructively replaying development changes into test/production environments.

So for example, you might have a feature branch that includes some schema changes and some value modifications, and a content branch that includes a bunch of inserts into a few content tables that happen to include foreign key references to each other (so you need to maintain referential integrity when replaying those updates/inserts).

I don't see anything in the description that indicates this tool address those problems. For me, those are really the only problems that a DB version control system ought to be focused on. Speed of snapshotting is not all that important in a development environment as you typically work on a cut-down dataset anyway. A minute or so to take a snapshot a few times a day isn't a huge deal, whereas taking more frequent snapshots doesn't seem like something that adds any value, if it doesn't address any of the other problems.

4 comments

"Another issue is non-destructively replaying development changes into test/production environments."

"...you need to maintain referential integrity when replaying those updates/inserts"

I think most of your misgivings about this tool are grounded in the expectation that it would provide merge features like Git. Since it claims to be "git for databases", I suppose that's a fair expectation.

However, the tool does provide the core functionality of Git, which is to implement a tracking system for hash-tagged objects (or, in this case, tables). This does have value. Not only is it faster than standard db dumps, it's also more space-efficient. For some people, this could be really valuable.

I think it's an interesting tool and could serve as a launching point for more powerful/useful functionality.

This sounds like something Sqitch would help with.

http://sqitch.org

FWIW, I found shmig [1] much simpler than sqitch.

Last I used sqitch, it returned a zero value on error, which made me leery of using it for scripted deployments.

[1] https://github.com/naquad/shmig

Please file a bug report. https://github.com/theory/sqitch/issues.
Sqitch sounds like just another migration tool, not a DB change tracking tool.

I have been looking for a tool that will allow me to track, diff and revert changes to the content of specific "business logic" tables so that we can acurately track and test those changes.

It doesn't look like anything like that exists, so eventually I'll have to roll my own.

I back up this recommendation whole-heartedly. Sqitch is a fantastic database tool. I stopped using Rails-type migrations because of it (I write PL/pgSQL functions and wanted a migration tool to handle that well, amongst other reasons.)
I would disagree that referential integrity and schemas are the only issues a DB version control should focus on.

For example, I would be very interested in having git like branching on top of something like this - http://sandbox.substance.io/docs/lorem_ipsum.json. Basically version controlled schemaless object tree.

I recently implemented something like this, backed by mongodb, and by exposing a HTTP api which mimicked git. I had to relax these two requirements, but it is still worth it.

However I would be very much interested in using libgit2 with a database backend instead of filesystem.

EDIT: not affiliated with substance in any way.

You can implement a change DAG for a DB inside the DB. It works pretty well.
What is a change DAG?
A directed graph showing the dependencies between changes.
I'd love a tool that solves that problem. Schema / data / user / permission migrations, support for both production and development environments and integration with CM tools like ansible would be fantastic.