Hacker News new | ask | show | jobs
by dagheti 6284 days ago
The flexibility of the relational model is built on its bare simplicity: values + sets + logic. If you propose adding to this and giving up the benefits provided by these simplifications (simplicity of reasoning, ability to change the physical implementation without affecting the logical one [those not-magic indexes], declarative constraints, declarative queries), you need to have a really good reason.

The network model if it means anything is letting you add pointers to the mix and changing how you reason about your data from sets to graphs. This makes it harder to declare constraints, check integrity, update sets of data, access your data in different ways, and reason about your queries. An imperative script is in no way as safe a program as a declarative query.

The most clear benefit in my view to the network model it lets you use your object code fairly seamlessly. Ok.

The problem is that this is a good trade off for programs where you don't care primarily about your data, but about your code. I'd argue that isn't true for the majority of programs that have databases at all. The issues that kill your system years down the line and give you nightmares are not code issues, they are data issues. And the code you write to fix those problems... will end up re-implementing all those annoying "heavy" bits of RDBMSs that coders seems to hate.

1 comments

And the code you write to fix those problems... will end up re-implementing all those annoying "heavy" bits of RDBMSs that coders seems to hate.

Or you won't. I have many important KiokuDB applications in production. And they are not toy Web 2.0 things, they are important sites that perform important data analyses. I don't think any of the logic I had to write to support was particualarly difficult, and it was fewer lines of code than I would need to define my ORM classes.

The flexibility of the relational model is built on its bare simplicity: values + sets + logic.

Simplicity is good. However, software is complex, and complexity has to live somewhere. Look at git, for example. The underlying model is simple and beautiful, blobs, trees, and commits. Wonderful. But, to make that beautiful model into a revision control system, thousands of lines of code had to be written. So while simplicity makes the bottom part of the system simpler, it didn't do much for the overall simplicity of the entire system.

I feel the same way about object databases. They are more complex than a relational database (unless you count things like replication and embedded scripting and ... as relational database features, which I don't), but they help me decrease the complexity of the code I see.

As an example, when I use an RDBMS, I have to maintain:

SQL schema + upgrades

ORM classes

Logic classes (for abstractions over multiple tables or data stores)

Random scripts to query the database and give me munged reports

With an object database, I only have to write the logic classes and the random scripts. The random scripts are slightly more complicated, but not significantly more. The main app is much simpler, and much easier to test. I also don't have to hack my data model into the relational model. (Ever represent a tree in a relational database? It's a hack.)

> However, software is complex, and complexity has to live somewhere.

Exactly. This argument comes up in a lot of topics. For example, that pure functional programming is easier to reason about. True, but as soon as you have to simulate state it's more complex that just having state. This is also the reason that pure functional programming isn't automatically parallelizable. Data dependencies don't magically go away.

Same thing if you are simulating object database features in a RDBMS.

For example, that pure functional programming is easier to reason about. True, but as soon as you have to simulate state it's more complex that just having state.

I actually disagree here. It's true that the traditional state model of "everything gets everything" makes writing code very easy. Everything can see and modify everything else, so you never have to worry about manually passing state around.

This makes the code easy to write, but very difficult to debug. A function can do something different every time you call it with the same arguments. That is hard to reason about.

There are actually a few layers involved, the programming language, the data types, and the app code. In a non-functional language, you really ignore the data types and let your app code interact with the language. ("var foo = 42; ...; foo++")

In a language like Haskell, it's important to think about your data types. Your state abstraction will live there, and then your app code will not have to worry about the details of passing state around. So in the end, your app code looks the same in either paradigm.

An advantage about this approach is an increase in generality. If you just have some variables hanging around and some "stuff" that uses them, that's all you can ever have. If you interact with state via well-defined interfaces like the Monad or Applicative Functor, you can build abstractions that work on all types like this. This has worked very well for Haskell. (An example is the do syntax for monadic computation. Although you have functional purity, your program looks imperative. This, IMO, is the best of both worlds. There is some complexity under the hood, but you get code that's easy to read, write, and reason about.)

I agree with you here 100%. Maybe this is a good pathway to explaining why relational databases are good ideas in much the same way as having a programming strategy that uses a backbone of pure functions. This purity-dividend is exactly why the relational model is such a good strategy for database management.

In the same way haskell lets you separate your pure functional code from your monadic code, giving you the ability to referentially transparent reasoning and construct imperative machines where you must, the RDBMS approach applies total logic programming to database management. It's frustrating because your code cannot all live in one language as it can in Haskell, but the separation of pure from impure is exactly where the value is coming from.

RDBMSs focus on values and total computations (you know they will halt, making it even easier to reason about queries than non-total functions) allows you to isolate simple logic programs from the rest of your impure non-total code.

Navigational databases don't give the same benefits because they are not value based nor are they total. They are the imperative-model of the database world, and though they are easy to write, they will end up difficult to debug, maintain, and keep coherent.

I don't think this is what I meant.
I agreed with your overall point, but I was expanding on part of what you were saying:

You seem to appreciate how Haskell allows you to separate your pure code from your non-referentially transparent monadic code. This gives you reasoning where you can have it, and you use monads where you can't.

I'm just saying that same argument applies in databases where using a relational database allows you to separate your total code from your pure code. You use the value-based relational strategy where you can have it (necessary and sufficient for database management), and you use functional (or imperative) programming where you can't.

Ok so let's clarify this:

(SQL schema & upgrades + ORM classes + Logic classes, random scripts) - (logic classes + random scripts) = SQL Schema & upgrades + ORM classes

The question then becomes how do you handle the trade-off between being able to declare constraints (that evil schema) and the necessity to "map" how you access your data to your programming language.

If you "program" your constraints in your host language, you will need to either recreate the declarative system provided by a RDBMs or else create a system that will be fragile. Sure your code protects you against inserting bad data right? Well what about updating? What about when you delete? What if your constraint references another value in your database? What if that one changes? Integrity is best declared, not guarded by your program.

If your network model implements declarative constraints, then how do you define them? Those "random scripts" again? Suddenly all those performance benefits of navigation disspear when you're updating that data and checking those constraints.

I've seen many improtant production systems that used the network data model and as a warning: it usually ends badly. Be it COBOL, or whatever fresh re-invention of that navigation database wheel.

Integrity is best declared, not guarded by your program.

How do programs that don't use a database stay internally consistent?

With much testing and reinventing of the wheel.
I can definitely relate to the ugliness of ORM, and the elegance of object stores in an OOP environment. That's all fine and good, you can chop off the entire bottom of the stack, I get it.

My problem is data modeling with objects. I build my object architecture based on what I want to do with the data, but I've never been happy with it as a direct data model. It changes too quickly. Do I put things in an array or a hash? Well, it depends. Persisting these specific structures might be expeditious for the app at hand, but if I need to access the data in some other context then I could see complexity spiraling out of control very quickly.

A relational schema adds layers yes, but have low-level structured data adds tremendous value. There's a parity mismatch to be sure, but I'd rather do those situps than have no structured data at all.

Edit: One other thing, representing polymorphic data in a relational database is a nightmare.

Edit2: Wait, how did this end up in a separate comment? I am confused :)