Hacker News new | ask | show | jobs
by cico71 5104 days ago
I was about to prepare the query without ties and yes it is a lot harder than necessary and, more important IMO, much less readable/intuitive for people than have to maintain the code and that is exactly why different SQL dialects introduced ORDER BY/LIMIT/OFFSET/TOP/RANK etc.

But let me be a little bit picky about this and Date's view on ORDER BY.

The relational model deals just with the algebra/calculus without getting into the details of a language based on the model.

In the book pointed out by Matt, Date explicitly states he's not saying ORDER BY is not useful, just that it doesn't return a relation and thus it's not included in the algebra.

However a language based on the relational model, like Tutorial D, can include such an operator. To be double sure, I checked on The Third Manifesto V2 and a LOAD operator with an ORDER specification is defined in the context of the "special cased" support for arrays.

You can see a couple of paragraphs if you go here http://books.google.it/books?id=X85QAAAAMAAJ&dq=editions... and search for ORDER and LOAD (page 118).

TBH I'm not even sure it's a good idea to introduce arrays for ordering, but, anyway, back to the quota queries: we agree that regular aggregation operators are enough although the query becomes very complex.

In the same book referenced by Matt there's an exercise (7.14) showing how to do a quota query and you can see that even in Tutorial D it's complex.

However, in the solution, Date & Darwen also propose something else: to add a more specific RANK operator which is really just syntactic sugar to simplify this kind of queries. With the important difference, compared to ORDER, that it still returns a relation and not an ordered sequence of tuples.

Unfortunately the whole solution to the excercise is not available through Google Books preview, and the operator is formally defined elsewhere, but you can see how such RANK operator would work here http://books.google.it/books?id=WuZGD5tBfMwC&lpg=PA163&#...

Just a final comment about Date being reluctant to analyze the matter, unfortunately his work is disseminated in a lot of books (and he changed his position on quite several matters throughout the years).

I love the "SQL and Relational Theory" one but, having red all of his books, I would be hesitant to suggest it unless one already knows Date. I think the latest edition (8th) of "An Introduction to Database Systems" is still the best book to start with.

1 comments

I was about to prepare the query without ties and yes it is a lot harder than necessary and, more important IMO, much less readable/intuitive for people than have to maintain the code and that is exactly why different SQL dialects introduced ORDER BY/LIMIT/OFFSET/TOP/RANK etc.

Very good point.

In the book pointed out by Matt, Date explicitly states he's not saying ORDER BY is not useful, just that it doesn't return a relation and thus it's not included in the algebra.

My biggest gripe about ORDER BY, LIMIT and relational model is the fact that while Date and others made some attempts to express these operations in terms of relational algebra, they never (AFAIK) tried to do the opposite: alter the relational query model to naturally support them. It's not hard: just replace sets with sequences or arrays. It will gives you natural ORDER and SLICE operators as well as new aggregates FIRST, LAST, NTH. It solves duplicates without having to introduce bags, gives windowing functions for free and probably better represents how modern RDBMS interpret a query. Another hint why sequences may work better than sets is the fact that regular set operations such as INTERSECT and UNION (as opposed to UNION ALL, which becomes concatenation) are so rarely used in real-world queries.

I'm not even arguing that this is a good approach, but I think it deserves some discussion and it appears they never even thought of a possibility of changing the model treating it not as an instrument, but as a sacred scripture.

First I don't know why order by, limit, offset, or windowing functions, can't be said to return a relation if we define relations in a way which is sufficiently useful to include these operations. In other words, they are used in ways which returns sets of tuples (or sets of entities if you want to see it that way), based on specific selection criteria.

I would thus agree that to the extent that these are not part of the relational model this says more about the incompleteness of that model than it does about the operations themselves.

Relations being defined what they are is not accidental; sets are not bags and there are a lot of very good things that come out of a relation having a very well defined, uh, definition.

It's not a matter of "usefulness" but of "well-defined" that allows us to derive a whole lot of other interesting things.

Folks actually have defined ORDER BY, LIMIT, OFFSET, etc. in terms of the RM; it's just that the typical ORDER BY doesn't return a relation because of ordering (sets are unordered by definition) and so there was a lot of gymnastics they had to do in order to keep the set theory intact.

Sure, arbitrarily reordering is not a hard concept (or even implementation) but to make sure you cover all the bases requires a significant amount of work. A RDBMS is a complicated thing and you don't want to just add something to it without doing proper due diligence.

One could argue that NULLs are more "useful" (I disagree) but the addition of NULLs (a deceptively simple concept) has vastly overcomplicated SQL and lead to a number of inconsistencies in the spec.

Sets may be unordered by definition but that doesn't mean you can't define something interesting as an ordered set.

Consider the Pythagorean attempt to prove that all numbers were rational by trying to prove that the square root of two was rational. That they were able to prove that it was not rational meant that we ended up with a new category of numbers. Similarly once you get into the square root of -1 you get into yet another category of numbers designed to address that problem.

Our numeric model isn't complete with just rational numbers, or just rational and irrational numbers. Today we have to add imaginary and complex numbers as well. Why shouldn't we be expanding relational math in the same way?