Hacker News new | ask | show | jobs
by jacques_chester 5334 days ago
These are not new approaches to data modelling.

Document databases, network databases and hierarchical databases (IMS, CODASYL etc) predate relational databases by decades.

Relational is the universal default for a simple reason. When first introduced it proved to be far better, in every conceivable way, than the technologies it replaced.

It's as simple as that. Relational is a slam-dunk, no-brainer for 99.99% of use cases.

Still, if you really want a fast, proven system for one of the older models, you can get IBM to host stuff for you on a z/OS or z/TPF instance, running IMS. It'll have more predictable performance than AWS to boot.

5 comments

I agree entirely - I think when people rebel against "relational databases" they're actually just realizing that the normalization fetish can be harmful in many application cases.

You're better off with MySQL or PostgreSQL managing a key-value table where the value is a blob of JSON (or XML, which I've done in the past), then defining a custom index, which is pretty damn easy in PostgreSQL. Then you have hundreds of genius-years of effort keeping everything stable, and you still get NoSQL's benefits. Everybody wins.

Normalization is a tricky thing. On one hand, highly normalized databases have better flexibility in reporting, IMHO. On the other, you lose some expressiveness regarding data constraints. High degrees of normalization would be ideal if cross-relation constraints were possible. As they are not, typically one has to normalize in part based on constraint dependencies just as much as data dependencies.
First, the more I have looked, the more I have found that non-relational database systems are remarkably common and have been for a long time.

The relational model is ideal in many circumstances. However, it breaks down in semi-structured content, content where---parentheses for grouping---(hierarchical structure is important, data is seldom written and frequently read, and where read performance navigating the hierarchy is most important) and so forth.

So I'd generally agree, but not every problem is in fact a nail.

> However, it breaks down in semi-structured content, content where---parentheses for grouping---(hierarchical structure is important, data is seldom written and frequently read, and where read performance navigating the hierarchy is most important) and so forth.

Again, this problem is not new. Database greybeards call this OLAP and it's been around since the 80s.

There is nothing new under the sun in this trade.

No. I am talking about something like LDAP, not OLAP. LDAP may suck badly in many many ways but it is almost exactly not like OLAP.

OLAP is typically used to refer to environments which provide complex reports quickly across huge datasets, so a lot of materialized views, summary tables, and the like may be used (as well as CUBEs and the like). Hierarchical directories are different. In a relational model you have to transfers the hierarchy to get the single record you want and you are not aggregating like you typically do in an OnLine Analytical Processing environment.

This is why OpenLDAP with a PostgreSQL backend sucks, while OpenLDAP with a non-relational backend (say BDB) does ok.

I am not saying anything new is under the sun, just that some of the old structures haven't gone away.

I was referring to the read/write preponderance. Normalisation optimises write performance, storage space and also provides strong confidence of integrity. But it means lots of joins, which can slow things down on the read side.

That's why OLAP came along. Structured denormalisation, usually into star schemata, that provide fast ad-hoc querying. I think part of the enthusiasm for NoSQL arises because most university courses and introductory database books will go into normalisation in great detail, but OLAP might only get name checked. So folk can get an incomplete impression of what relational systems can do.

If I had a purely K/V data problem -- a cache, for example -- I would turn to a pure K/V toolset. Memcache, for example.

Hierarchical datasets have long been the blindside for relational systems. Representable, but usually requiring fiddly schemes. But in the last decade SQL has gotten recursive queries, so it's not as big a problem as it used to be.

Normalization is formally defined based on data value dependencies. However, because there is no way to set constraints across joins, in practice, the dependencies of data constraints are as important as the dependencies of data values.

As far as recursive queries, I am not 100% sure this is ideal either from a read performance perspective. There are times when recursive queries are helpful from a performance perspective, but I don't see a good way to index, for example, path to a node. Certainly most databases don't do this well enough to be ideal for hierarchical directories. For example indexing the path to a node might be problematic, and I am not even sure you could do this reliably in PostgreSQL because the function involved is not immutable.

Your replies so far are excellent. You're pointing out things I've overlooked, thanks.

> However, because there is no way to set constraints across joins, in practice, the dependencies of data constraints are as important as the dependencies of data values.

I don't follow your argument here. Could you restate it?

> As far as recursive queries, I am not 100% sure this is ideal either from a read performance perspective. There are times when recursive queries are helpful from a performance perspective, but I don't see a good way to index, for example, path to a node.

Poking around the Oracle documentation and Ask Tom articles, it seems to be more art than science; mostly based on creating compound indices over the relevant fields. Oracle is smart enough to use an index if it's there for a recursive field, but will struggle unless there's a compound index for other fields. I don't see an obvious way to create what you might call 'recursive indices', short of having an MV.

> Certainly most databases don't do this well enough to be ideal for hierarchical directories.

It'll never perform as well as a specialised system. But relational never will. An RDBMS won't outperform a K/V store on K/V problems, won't outperform a file system for blob handling and so on. This is just another example of the No Free Lunch theorem in action.

My contention is that we, as a profession of people who Like Cool Things, tend to discount the value of ACID early and then painfully rediscover its value later on. The business value of ACID is not revealable in a benchmark, so nobody writes breathless blog posts where DrongoDB is 10,000x more atomic than MetaspasmCache.

> When first introduced it proved to be far better, in every conceivable way, than the technologies it replaced.

That's not exactly true; what they did was offer a generic query and constraint model that worked well in all cases while offering reasonable performance. They were not generally faster in optimal cases, but they were much easier to query especially given new requirements after the fact because the queries weren't baked into the data model itself. That generic query ability and general data model always come at the cost of speed; always. Document databases have always been faster in the optimal use case.

You're absolutely right -- RDBMSes were designed to solve problems with the nosql-type approaches that preceded them. The nosql bandwagon is blindly rolling into the past, where it will crash into the old problems of concurrency and consistency under load.

BTW if you want nosql-style schema flexibility within an RDBMS, then a simple solution is to store XML or JSON in in a character blob. Keep the fields you need to search over in separate indexed fields. If you make incompatible version changes, then add a new json/xml field.

Another solution is to use the hstore feature in postgres to store key value data.
> BTW if you want nosql-style schema flexibility within an RDBMS, then a simple solution is to store XML or JSON in in a character blob.

In all sincerity, I would strongly recommend against this. If your problem absolutely defies normalisation, don't use a relational database.

very true but it's a resurgence of modeling alternatives which can only help to enrich our ability to write interesting applications. yes you can model a social network in a RDBMS but it's not as efficient or as flexible as using neo4j. or yes you can model a key value document in a RDBMS but again it's not a good fit. The right tool for the right problem. You don't build a house with only a hammer so why should we build applications only on one storage concept ?