Hacker News new | ask | show | jobs
by seibelj 1761 days ago
> NoSQL databases are maturing, for sure – we’re starting to see support for transactions (on some timeframe of consistency) and generally more stability. After years of working with “flexible” databases though, it has become clearer that rigidity up front (defining a schema) can end up meaning flexibility later on.

So funny to me that NoSQL boosters have only recently understood that designing sane schemas and knowing what order your data is inserted is important for data integrity. It's like an entire generation of highly paid software devs never learned fundamental computer science principles.

2 comments

To be fair, relational algebra is hard.

That being said: going back to 1970 to read the original "A Relational Model of Data for Large Shared Data Banks" by Codd (the paper which started the relational-database + normalization model) is incredibly useful.

But yeah, all of this debate about "how data should be modeled" was the same in 1970 as it is today.

-----

SQL doesn't quite fit 100% into the relational model, but its certainly inspired by Codd's relational model and designed to work with the principles from that paper.

And strangely enough, legions of authors and teachers and courses do a worse job at explaining relational databases than Codd's original 11 page paper.

I had a specific class on relational algebra in uni and it is up there with algorithm design and analysis in the realm of classes that actually provided me the most long term value.

Relational algebra is a lot easier once you start viewing it as relational algebra - a declarative expression of intent that can be manipulated and re-expressed similar to other purely mathematical statements. Then, when performance tuning becomes the watchword, you take that flexible expression and slice and dice it according to how the DBMS you're working with requires to align it with performance. You always want to think of your queries as complex summoning spells that draw in different necessary resources in some particular patterns and then impose an expression form on that blob of data - then you'll skate through all things SQL.

Relational algebra comes from my study of constraint programming / optimization (a closely related field to 3SAT solvers).

From this perspective, the study of relations is more about solving these NP-hard problems. For example, coloring a graph. You can solve things "locally", such as:

    Texas | New Mexico | Oklahoma
    -----------------------------
    Red   | Blue       | Green
    Red   | Blue       | Yellow
    Red   | Green      | Blue
    Red   | Green      | Yellow
    Red   | Yellow     | Blue
    Red   | Yellow     | Green
    Blue  | Red        | Green
    Blue  | Red        | Yellow
    Blue  | Green      | Red
    Blue  | Green      | Yellow
    ...
    (etc. etc. for all other valid combinations)
And so on for each "locally valid graph coloring" (looking only at a limited number of states). You then combine all of these relations together to find a solution to the 4-coloring problem.

We can see that "solving" a 4-coloring problem is as simple as a cross-join over all these relations (!!!). That is: Texas_NewMexico_Okalhoma cross-join Texas_Oklahoma_Louisiana cross-join Louisiana_Mississippi_Arkansas cross-join ...

We can see immediately that "Texas_NewMexico_Okalhoma cross-join Texas_Oklahoma_Louisiana" will create a new relation, a "Texas_NewMexico_Oklahoma_Louisiana" table, though with the information from just two earlier tables, this new table is "incomplete" so to speak (you'll need to join this table with many other Texas, NewMexico, Oklahoma, and Louisiana tables to ensure global consistency).

We can imagine a big 48-wide table consisting of the 48 states of USA as the final solution to a graph coloring problem. Successful calculation of this massive table will enumerate all possible solutions (!!) of the graph coloring problem.

----------------

Somehow, I find it more obvious to understand relations from this perspective. If anything, learning about constraint programming has made my relational algebra better (and as a result, has made my database skills probably better too)

Its also a funny corner where if you "study relations" hard enough, you eventually reach NP complete problems. 3SAT is easily written in the form of database relations after all :-) (But using a database as a 3SAT solver is probably a bad idea, despite being mathematically connected)

That's exactly what it is. "Self-taught coder" really isn't the right word for what many are, as it implies some form of intentional individual study. More like "self learned to duck tape shit together thanks to Stack Overflow" but we don't have a catchy term for that.