Hacker News new | ask | show | jobs
by bongobingo1 1148 days ago
My only hesitation with methods like this is it ends up splitting the business rules into two places, where one is sort of obscured.

It's obvious to look at `add_new_payment` for the code that runs when adding a new payment, but then the code isn't there, so you have to know/ask or search in either migrations, a fresh structure dump or poke at the actual db (!).

I think they're great for other, well, effects when needed. PostgreSQL is a real powerhouse.

7 comments

Yeah I was expecting the post author to discuss the trade-off being made here because it’s really important to do so. The biggest complaint I have with these pithy articles is that they try to sell you on a particular trade-off without explaining what the deal is. It makes me think the author:

1. Just discovered this. 2. Implemented a bunch of them. 3. Hasn’t been maintaining this solution for more than a couple of months.

This is a trade-off that only manifests itself after maybe a couple years when you’ve built a system and you have many hands maintaining it. Either there’s some performance problem that makes it worth it or you’ve just obfuscated half your code for no reason. In the latter case those many hands are going to wreak havoc on your system before you figure out how to make it maintainable.

> The biggest complaint I have with these pithy articles is that they try to sell you on a particular trade-off without explaining what the deal is.

This is basically the clickbait in the wider world affecting software development, even though it might not look like it. Boiled down to the essentials, we are telling each other the software version of "Here's How to Lose 10 Pounds in Time for Summer." way more often than "How To Balance Diet and Exercise to Remain Healthy Over A Decade".

But it's up to us to those of us who like more discussion to reverse these trends. In that vein -- do you have good sources that typically talk about tradeoffs with technology rather than promoting a specific one? Kleppmann's book on Data Intensive applications is one I have found in the past.

It’s unfortunately hard for me to point to a source because everything I know I absorbed from more experienced engineers on hardware projects.
I'm not sure if the post was updated after your comment, but from the article:

"Triggers should be used with caution since they can obscure critical logic and create an illusion of automatic processes. While this can be advantageous in certain scenarios, it can also pose a challenge in terms of debugging, testing, and monitoring since they are not readily visible to developers."

DB upgrades are a pain. Triggers must be deployed with “UPDATE” instructions, it’s not as easy as keeping code in git.
Triggers are DDL, unless you've implemented them in the app layer.
I think this problem can be robustly solved if you have the right mechanisms in place:

1. Migrations. Your schema needs to live in version control, and changes to your schema must be applied by an automated system. Django migrations are the gold standard here in my opinion, but you can stitch together a custom system if you need to, one that tracks which migrations have been run already and provides a mechanism to apply new ones.

2. Automated tests. Your triggers MUST be covered by automated tests - call them unit tests or integration tests depending on your preferences, but they need to be exercised. Don't fall into the trap of mocking your database in your tests!

With these two measures in place I'm 100% comfortable using triggers that split my business logic between my application code and my database code.

I don't think migrations (at least as done by Django et al.) solve it - you want a declarative source of truth for what the schema looks like today, not a chain of changes that only tell you that after computing the combined effect.

Even if they just created a generated file of the final schema, that sat in version control and errored the makemigrations check (just like a missing migration) if it was out of sync, that would be a significant improvement IMO. But I think the Django maintainers at least would say they want the ORM DSL to be that. (But it's way too incomplete, you'd be limited to a tiny subset of postgres, and even then you have to be on board with that being a reasonable description of your schema, not wanting the actual db schema anywhere.)

If you really, really need to be able to see a SQL schema representing the current state, a cheap trick is to run an automation on every deploy that snapshots the schema and writes it to a GitHub repository.

I do a version of that for my own (Django-powered) blog here: https://github.com/simonw/simonwillisonblog-backup/blob/main...

This comes up a lot, and I'm always surprised that more people don't know about Flyway and "repeatable migrations": https://flywaydb.org/documentation/tutorials/repeatable

> Repeatable migrations are very useful for managing database objects whose definition can then simply be maintained in a single file in version control. Instead of being run just once, they are (re-)applied every time their checksum changes.

One of my pet projects addresses these points you raise about Django + migrations, have a look:

https://pypi.org/project/DBSamizdat

In a nutshell it allows you to keep DB functions, triggers and views around as Python classes, so you can version them together with the rest of your application code. The DB state gets updated for you (in the right dependency order) whenever you change them.

It can also run without Django.

This is exactly what I would use dbt for
Had a thought recently. What's stopping someone from separating table migrations from others like functions/triggers?

The standard tables can use the standard process while triggers, etc are run declaratively. When you change something, a tool could simply tear everything down and rebuild it. No need to worry about data since those are handled separately.

Would performance be a concern? Or is there something I'm missing?

> you want a declarative source of truth for what the schema looks like today, not a chain of changes that only tell you that after computing the combined effect

Applying a series of migrations to get a final db schema is not much different than a version control system like git.

Ok, so how do I checkout a version and view the file/schema tree?

If you like, you can view my comment above as saying 'if they included that tooling, not just the similar tree of changes stored, it would be better'.

How to get from one state to the next is interesting to the computer, not to me (not after I've initially written it/done it in prod anyway), I'm interested in the result, where do we stand after all of the migrations (currently on disk having checked out whatever).

You checkout that version, then run the migrations from scratch against a fresh database.

In Django that's "./mange.py migrate".

And then inspect the db, exactly. In other words you can't, there is no declarative 'current state' checked in or provided by the migration tooling.
It’s completely different as git does not store deltas but snapshots of whole files and file trees.
> Django migrations are the gold standard here in my opinion

Intriguing. In JavaWorld it's really only Flyway and Liquibase, and the important self-imposed rule, "Thou shalt never remove a column from a table."

I've always resisted the Django migration approach personally. One of the things I disliked most about it.
But what problem is moving to triggers solving? Is keeping all your logic in a single place actually a problem, or is moving part of it to Postres tempting because it's nifty? How much training will you have to provide for new hires to be able to run with Postgres triggers instead of just Django logic?
Triggers protect you against bugs.

If your application logic needs to remember to update a denormalized column somewhere it's very easy for a bug to slip in in the future when someone adds a new piece of Python code but forgets to update the denormalized column.

With triggers you can eliminate that source of bugs entirely.

This is even more valuable if you have more than one system interacting with a single database.

Triggers can also be significantly more performant depending on what you are doing.

But yeah having the business logic split like this is a hard sell. There needs to be an extremely good reason to do it. It adds developer overhead because not only do you have to maintain another test harness and deal with an entirely separate system encapsulating yours, every time you touch business logic you’re now wasting some brainpower deciding where it should live.

I did this with robust DB constraints and selective use of triggers and stored functions. It can feel like a strait jacket long term. Though it did help protect against less competent DBAs breaking things and avoid some footguns.
Table triggers are the ultimate foot gun in this respect. They are highly obscured! Stored procedures split your business logic too, but when you want to go and look at your database logic, it’s at least where you’re expecting to find it, and not built into a table.

I’d highly recommend people avoid them, unless you feel that you really need them _and_ you have very robust development processes. As soon as you deploy your first table trigger, from that moment you have to check every DML statement for unintended side effects.

> from that moment you have to check every DML statement

If that's the case, you have a documentation problem. It should be easy to decide if there are side effects or not just by looking at the DML and the metadata you need for it.

In fact, the case here is that the trigger on the article is an incredibly bad one. It's not a natural consequence of the table, or the database structure. It's probably not even always true to the business rules, what is the one property triggers must have no matter what.

The problem isn’t really that it’s hard to document. The problem is that when you don’t use triggers, you don’t have to consult documentation to know what’s going to happen after an insert (or an update, or delete…). People who haven’t used triggers would assume that a row will simply be inserted. But if you use triggers this is no longer a safe assumption.
Business rules are already often in multiple separated places in a codebase. If you mean "it should be in the same git repository", then triggers should be already in migrations.

But IMO, even migrations are a stopgap solution to this problem. In some cases, such as in ORMs like Rails' ActiveRecord, even the column names aren't present in the model by default, they are only migration files.

The real solution is putting the definition of the triggers (and all other database things too) closer to the code that operates on them. In models, for example. And models should enforce that only the defined triggers are present in the database. Of course, since most ORM users aren't really big DB users, this isn't exactly common. Until we get features like this, I agree that triggers can be confusing.

If you are using some JPA implementation, eg Hibernate, then JPA events are more or less what you are talking about except JPA can't enforce the condition that JPA-defined events are the only triggers present. If you write custom triggers then you run the risk of stale entity data being presented to the user.

But otherwise, this fulfills the condition of triggered events residing in the codebase. Of course, this means you have to be comfortable with using an ORM in the first place.

You can say similar things about any API based development though. Triggers are just a part of the database API.

Databases exist to manage data, not just store and retrieve it.

it is also a very restrictive environment, you either only trigger on a single in transaction table, or you risk having triggers tripping other triggers, limiting the approach scale anyway. at which point a well managed transaction from an active record or a data gateway will do miles better.
You have exactly the same problem in microservice architecture while calling a remote endpoint.