Hacker News new | ask | show | jobs
by magicalhippo 619 days ago
Not OP but in similar situation. Main reason is that the primary tables represent various official declarations, and they have a lot of fields. In addition, our customers are required by law to keep the data readily available for 5-10 years. Also the law states our customers are responsible for the data they send through our systems, so they want to see what they're sending.

So just to give an example of how field count "blows up", there can be up to 8 different parties related to a single declaration, and for each we want to have for display/print purposes our internal id, id in customers system, organization number, name, address, postal code, location, country code, contact person, contact person phone and email.

So we're talking about almost 100 fields just to store parties.

We want to store all that separate from say our "companies" table as a company might change name, location etc, and we're required to show/print the details from today 5+ years down the line.

We could put that in a separate table, but that means 8 joins to fetch data for a single record just there. And that's just one piece of the declarations, and at least 3-4 of these parties our customers want to see in overview grids and be able to filter on there.

Now tack on 20+ years of doing this, with various official systems coming and going with different technical requirements, and it adds up.

2 comments

> but that means 8 joins to fetch data

Without seeing your schema, that sounds like a bit much, but maybe not. That is kind of the point of a relational DB.

I always attempt to normalize as high as possible first, and then denormalize IFF it’s necessary for performance. Postgres has a default join limit (i.e. where it shifts to the genetic algorithm for joins, and may not be optimal) of 8, but I’ve still seen it do fine far above that.

If we normalized as much as possible we'd be looking at 20-30 joins just for the main view of a declaration. Main overview grid would have probably 10+ joins. And keep in mind almost all of these would have data, so instead of inserting one row you're now inserting 20-30 rows + updating corresponding foreign key indexes.

Think performance would be pretty crap, and developer experience as well.

Sounds like it's rife to convert to JSON fields maybe?
You do not want to do this, I assure you. The TOAST / DE-TOAST overhead alone would wreck performance, plus OP said it was update-heavy; Postgres can’t update a JSON object on disk, it just rewrites the entire blob.
> Postgres can’t update a JSON object on disk, it just rewrites the entire blob.

Postgres does this for _all_ row updates; it must, for transaction isolation to work.

I'm not sure, there's also HOT-updates (heap-only tuples). It's an optimization where data in modified in place if none of the modified columns are part of an index and maybe other conditions but I don't remember it all too well.
Did not know this; thank you!
I strongly disagree. This is an excellent application for JSONB. The user outlined that the records are for historical compliance purposes and therefore aren't frequently updated after the initial phase.

A schema of `party1 JSONB, party2 JSONB, ...`, or even `parties JSONB` would likely be far more maintainable and sensible than the "table-as-spreadsheet" approach.

If we could limit us to only PostgreSQL we could perhaps do that for new modules. Sadly we have some customers which want to use their own MSSQL database.

Though we'll see what the future holds. PostgreSQL is interesting due to cost and features, and many of the large customers are shifting their POV on how they want to run things, almost doing 180's in some cases.