| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jhoxray 1739 days ago
	Thank you, very good points!

1 comments

throwaway81523 1739 days ago

Another saying is that all data relationships become many-to-many over time. First you think that each customer has a phone number, but then you find that they often have several phone numbers, and some phone numbers will belong to multiple customers, and so on.

link

bob1029 1738 days ago

This is the most important realization for me.

I am starting to chase a form of normalization even more strict than 6NF. Everything gets a synthetic key, there is no such thing as a domain key anymore. Facts are in tables 2 columns wide, no exceptions. Tuples of id & fact. The only other things I permit are a "root" type and a "relation" type, being 1 and 3 columns respectively.

If someone told me I need to start tracking when a phone number was last modified, or create a new approval loop for that change, I want this to be a concern entirely contained under the phone number tables. Not something that needs to be added to everything that has string column called PhoneNumber.

I strongly believe this degree of normalization to be perfect from a domain modeling perspective. It almost feels stupid to not do it this way looking back at all the unexpected changes we've endured.

SQL is also the perfect tool for putting Humpty Dumpty back together again. You can build views to present your change-proof 6NF+ schema in something the business can consume. I.e. wider tables that make some assumptions. If we get our assumptions wrong on the views, we just iterate so we don't break old SQL. VCustomers2, VCustomers3, etc. The cost of iterating your schema by way of views is negligible compared to fucking with it elsewhere.

link

magicalhippo 1738 days ago

> I strongly believe this degree of normalization to be perfect from a domain modeling perspective.

How's performance with several tens if not hundreds of joins in a query?

link

bob1029 1738 days ago

Performance is just fine for us. It's all about scoping for the right things. You are never looking for the whole schema all at once. No one ever said it had to be running on disk either.

link

magicalhippo 1738 days ago

> You are never looking for the whole schema all at once.

Well we have to. Our system exports the data to other systems, which would involve generating a file with all the data (think order or invoice but with 100+ fields). For many of our customers this job is time sensitive.

Besides that we have overview grids (think "all active orders") for our customers, for many of these our customer will want to have 20+ fields visible. Using your method that can be 50-100 joins, and this would be an interactive case so again time sensitive.

That's why I was curious how it scaled.

> No one ever said it had to be running on disk either.

Well our data absolutely has to be persisted to disk on commit, and a stale cache is not tolerable. But apart from that, sure.

link

bob1029 1738 days ago

Exporting data to other systems is a massively different idea from using the data in a business-transactional sense.

If you wanted to bulk export/stream transactions from my system, you would simply take the compressed event log batches and replicate them wherever they need to go. The working set is kept in memory and can be reconstructed by simply replaying the events.

If I were going to solve your problem, I would probably maintain 2 materialized working sets in memory, one potentially on another server that is slightly behind real-time. The cool thing here is you can replay the events into any arbitrary schema, as long as they are granular enough (6NF+).

link

dsego 1738 days ago

Is that anchor modelling?

link

bob1029 1738 days ago

Very nearly, yes. The biggest distinction is that all keys are synthetic.

link