Hacker News new | ask | show | jobs
by dspillett 1064 days ago
There are many papers describing how NULL as an explicit “value” (you still at least need something to present empty results from left or right joins) can be made unnecessary – just look back to the development of relational databases and the theory that are based upon. No need to pay for this author's paper when it probably just restates things that are found in many open access papers/books/other.

Representing unknown/non-existent without allowing an explicit NULL value is can be modelled by the property being another entity, you have a child table that lists the values of the property so where you would store a NULL otherwise there simply exists no row in that table. In SQL this means extra joins and in most (all?) SQL implementations this results in (sometimes significantly) lower performance. The big argument there is whether this points to a problem in the theory (if arguing that NULL should not be “stored”), in SQL as an implementation of relational database theory, or in the implementations of SQL…

[I'm aware my terminology is all over the place, as most people's is: when talking about relational theory rather than SQL as a speicific implementation of it I should use tuple not row, relation not table, etc., but getting that right only serves to confuse people (a great many of those with a more self-taught and/or field trained programming background background than one that involves any computer science study) who know only SQL and have thus far not needed to be aware of the theory or history].

2 comments

It's not a problem with the theory, because the theory was not created to be a high-performance application.

It's also clearly not a problem with SQL since people very loudly and clearly prefer the version with nulls and operations that create nulls.

It is a problem with the idea that people should program in relational algebra. They shouldn't. It's the same kind of issue as functional languages that extend lambda calculus; logical languages with explicit evaluation strategies; type systems with undecidable situations; module systems that allows access to private features... No actual usable system is a perfect representation of the math behind it.

But how to represent missing values in left/right/outer joins?
I think this one of the many things that the author wanted to point out, outer joins is not a valid relational model operator, and thus, leads to generating nulls. Outer joins would have to be some kind of subqueries in relational model, which would likely cause either way too subtle querying semantics or, as little as the would, still always unacceptable performance cost, given how databases are benchmarked.