Hacker News new | ask | show | jobs
by parasubvert 3958 days ago
The relational model is extremely general, its' been argued (fairly successfully) by Date and Codd to be THE most general model (Graph being a close second). It's a rigorous approach to managing data with integrity.

I used to be a programming language oriented person, was big into data structures and objects, but then I read Date and my mind was blown at how beautiful and expressive the relational model is -- for its intended purpose (managing data for logical integrity and ad hoc queriability).

The main issues are

1. is that many implementations don't include some features such as unions.

2. Certain things (tree traversal) have also been hard to express in older versions of SQL or older versions of Tutorial D (Chris Date's language that's closer to the model).

3. Sometimes you don't care about long term data management (i.e. ad hoc queriability and integrity), you just want programmatic data persistence with pre-baked access paths that are FAST.

4. Relational integrity features are often crude implementations that slow things down too much or require custom triggers.

5. Queriability in reality requires decent knowledge of the physical layout and indexing if you're going to make it performant

6. Most relational databases have not been built in a cloud native era where we assume distribution across ephemeral disks and compute

So... great mathematical model, great way to think through and organize your data for no ambiguity, but the practical implementations leave a lot to be desired.

The problem is that "my data is too complex for the relational model" often means "I haven't thought through my data". Things like maps, unions, ordered sets, N-ary relationships, graphs and trees, are actually quite straight forward to represent in relations. The challenge is many of the lessons and arguments for this are trapped in books from the 70s-90s, not on the Web.

2 comments

Agree, 100%. I too early in my career was very enamored by object and graph databases (this is pre the 'nosql' buzz), but once I started reading Date and Codd (and the inflammatory Fabian Pascal) some lights starting turning on in my brain.

Firstly, it should be made clear to people that SQL is not truly relational, and a lot of the things people dislike about it are nothing to do with the relational model and more to do with its late 70s, early 80s heritage. It was thrown together at a time when business systems were still very focused around COBOL.

The second thing that people are not picking up on is that the industry _already_ went through a pre-sql "nosql" phase in the 60s and 70s when network and hierarchical databases were the norm, and the relational model was developed to deal with the perceived faults those systems had: an enforced topology which could not easily rearranged at query time, lack of rigor in modeling, lack of standardized modeling concepts and notation...

Finally, I did find in previous jobs certain uses for nosql systems -- very low latency high throughput quasi-realtime systems that deal with very small bits of simply structured data and need to distribute it widely across a cluster. For that I used Cassandra (tho I understand now that there are successors that are better).

What I don't get is the point of systems like Redis or MongoDB which don't offer a compelling distribution implementation and simply replace the fairly well understood quasi-relational model of SQL with their own ultimately inferior graph/network/hierarchical models.

As a fan of Cassandra (though not for its relational model ;), what successors do you believe are better? Riak is the only one that comes to mind.

Btw, to me the point of Redis and Mongo is a very fast distributed dictionary. They're data structure servers, for when you want to persist and share data structures across processes, not "manage information". It depends on your goal.

Oh, this is true, and I think I came off too harshly. The relational model is general enough to support (almost?) any data model, it certainly has a lot of advantages in terms of efficient implementation, and the math is elegant.

I just don't think that it naturally reflects the data structures people use, and we should be willing to make the computer do the work, rather than humans.