Hacker News new | ask | show | jobs
by taffer 2463 days ago
> Relational databases get their name from the fact that relationships can be defined between tables.

This is a widespread misconception. Relational databases get their name from relations in the mathematical sense[1], i.e. sets of tuples containing facts. The basic idea of the relational model is that logical predicates can be used to query data flexibly without having to change the underlying data structures.

The basic paper by Codd[2] is really worth reading and describes, among other things, the problems of hierarchical and network databases that the relational model is meant to solve.

[1] https://en.wikipedia.org/wiki/Finitary_relation

[2] https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

2 comments

What is the relation? The table? (I.e. the tuples describing the rows). Or is it the joins? (In which case the article is correct). The columns?
One way to think about it is with a mathematical relation, like 'X > Y'. A relational database relation representing this relation would consist of a header tuple, <X, Y>, and a set of tuples whose values satisfy the relation, such as <10, 2>, <8, 3>, <9, 4>. In more common terms, the rows of this table would contain pairs of numbers in which the value of the X attribute is greater than the value of the Y attribute. This table describes the relation(ship) of certain pairs of numbers.

"Each tuple in a relation represents an n-ary relationship...among a set of n values..., and the full set of tuples in a given relation represents the full set of such relationships that happen to exist at some given time--and, mathematically speaking, that's a relation."[1]

[1] Chris Date, Database in Depth, page 46

This points to the dual view of a relation - intensional vs extensional. The beauty of the relational model of data is the morphism: by evaluating relational operations on the extensional model (data) we gain can answer questions corresponding to intensional model (concept).

For example given the extensional data ALBUM(TITLE, ARTIST) corresponding to the intention "the albums, each with a title and artist", we can compute "the eponymous albums, each with a title" via EPONYMOUS_ALBUM = ALBUM where (TITLE = ARTIST)

We started with some data for a relation corresponding with a concept, and were able to operate on the data to produce a new relation - data corresponding to a new concept.

The relation is essentially all the rows in a given table. In relational algebra, a relation is a set of tuples of a fixed length. Each position in each tuple is associated with some attribute (essentially the name of the column) and each element in a given position is a value of a certain "data domain" (essentially a data type, like "integer").
The table is the relation. A join is just an operation that combines two relations to a superrelation.
A join in an intersection in relational algebra
Oh, wow.

I've been in the industry for over 20 years. I even worked on SQL Server Analysis Services for 5 years (up to 256-dimensional 'cubes' in the early 2000s)... and I never thought of joins in this way. Granted, MDX was a beast in its own right.

Cue discussion about self-taught vs college educations. I've got advantages being self-driven learner... but I've definitely missed out in some regards.

MDX isn't a relational language, though. It's a dimensional language.

The best short description I can give about MDX is that it's the language your pedantic uncle would come up with after falling in love with Ralph Kimball, when all he knew to base it on was SQL.

But the core operations in MDX operate upon hierarchical dimensions and facts that can be aggregated.

    SELECT ... FROM ... WHERE
has no inherent relational semantic. It is simply a syntax that has been standardized upon for interacting with relational database systems. It was also, coincidentally, chosen as the syntax for another language, MDX.

Funny enough, the successor to SSAS Multidimensional is SSAS Tabular, where the query language is DAX. DAX was designed with an explicit goal of looking like Excel's formula language, but it is in fact a relational language which is semantically very similar to SQL, despite looking nothing like it.

Not necessarily, it can be a union too? Depends on the type of join.
A union is not a join at all - a union is a union. They're different concepts entirely.
Isn't a full outer join a Descartes product? And thus any other join is just a special case where rows get selected/projected/filtered out?
The Cross join is a Cartesian product. Good read: https://en.m.wikipedia.org/wiki/Join_(SQL)
The value in a table at a particular time is a relation, but so is the value of a view, the result of a subquery, the result of a select statement, and the value represented by the contents of a CSV file. In some API's this is called a row set.

It's essentially a table-shaped value. Conceptually it's immutable, and relational algebra is about calculating new values from old ones. A select statement does this too.

In a sense it reminds me of sentence structure. The table/relations is like the predicate and the rows contains all the subjects/objects that the predicate applies to.
A relation is a tuple (H, B) with H, the header, and B, the body, a set of tuples that all have the domain H.
As a non-mathematician I've always conceived of the relations among data as existing in the queries. In a SQL query the "relation" is specified by matching columns in the tables containing the data we're looking. Conceptually indexes, etc., are system implementation details.

That's certainly not a rigorous definition but helped me keep my head straight about what I was doing.

Thanks for the hint, we'll update the article! :)
Related to the logical view, it would also be great to include deductive databases:

https://en.wikipedia.org/wiki/Deductive_database

Deductive databases derive logical consequences based on facts and rules.

Datalog and its superset Prolog are notable instances of this idea, and they make the connection between the relational model and predicate logic particularly evident.

Codd's 1979 paper Extending the Database Relational Model to Capture More Meaning contains additional information about this connection. For example, quoting from Section 3 Relationship to Predicate Logic:

"We now describe two distinct ways in which the relational model can be related to predicate logic. Suppose we think of a database initially as a set of formulas in first-order predicate logic. ..."

It seems these rules refine the data outside the database itself. But they're so tightly integrated between the database and the application that the lines separating them become blurred.
Interestingly, when defining pure relations in Prolog, there is no "outside the database itself":

Both rules and facts are instances of a single unifying language element, a logical clause, which is also terminology from predicate logic.

This is useful from a conceptual perspective, and also for performance: Many automatic optimizations that deductive database systems perform apply equally to facts and rules. For instance, argument indexing is a notable feature of virtually all Prolog systems. It is similar to indices in relational databases, and can replace a linear scan of the knowledge base with a fast hash-based lookup that is performed in O(1).

> Both rules and facts are instances of a single unifying language element, a logical clause, which is also terminology from predicate logic.

Yeah; I'd like to see these general application layers more seamless and united in simple semantic units.

Like a higher level abstraction over code block elements, database record structures, and configuration setting ensambles; simply as phrases (..??) ....