| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cheald 5616 days ago

Indeed. A lot of the reason I like document databases so much is that you can solve problems differently (and many times, much more easily) than you could in a relational database. Compare:

    db.posts.find({tags: {$in: ["foo", "bar"]}})

to:

    SELECT * from posts JOIN taggings ON taggings.post_id = posts.id JOIN tags ON tags.tagging_id = taggings.id WHERE tags.tag IN ('foo', 'bar');

(Single query tag lookup; naively joins the entire taggings and tags tables before limiting with WHERE)

Or a "better" query with two subselects (yikes!)

    SELECT * FROM posts where post_id IN (SELECT taggings.post_id FROM taggings WHERE taggings.tag_id IN (SELECT id FROM tags WHERE tags.name IN ('foo', 'bar')))

And that's the "find where any tag matches" case. Try the "when all tags match" case ($all in MongoDB), and you'll go grey a few years earlier.

3 comments

mjb 5616 days ago

> (Single query tag lookup; naively joins the entire taggings and tags tables before limiting with WHERE)

Really? Which modern RDBMS database actually works like this when indexes are available?

Index selection and optimization is a standard feature of even the simplest relational database system. JOINs on keys with high selectivity (especially unique keys) are extremely efficient in modern database systems - in some common cases just as efficient as pulling two columns out of the same table. Subselects are something that several database systems (MySQL, for example) really suck at - but they are easily avoided in most schemas.

> Try the "when all tags match" case ($all in MongoDB), and you'll go grey a few years earlier.

Depending on your database engine, even very large AND chains can be extremely efficiently computed on an indexed column. For larger use-cases, there are good ways to avoid this problem using careful schema design.

There are legitimate reasons why relational databases aren't the solution to every database problem. Claiming issues which clearly do not exist is not a good way to get points for the other side.

link

jdf 5616 days ago

Looking at your first SELECT, there's very few RDBMSs on the market that won't evaluate that WHERE clause prior to the JOIN.

(disclaimer: I work for Clustrix)

link

fedd 5616 days ago

relational folks can denormalize when they want to

link

sqrt17 5616 days ago

They get strangled by their DBA before they're finished with it.

And: the point of document stores is that you can denormalize (i.e. put lots of stuff into one data item) and _still_ use indices into the lots of stuff. Most commercial RDBMSes have means and ways to do that (e.g. storage of XML content), but afaik it's not standardized and would tie you closely to a single ($$$$$$) database vendor who will not hesitate to make you bleed whenever they can.

link

fedd 5616 days ago

the things like olap and intentional denormalization is not that dba are against of.

by denormalization i didnt mean keeping xml in a clob, but keeping the values in a row as if several tables are already joined into one wide table, thus eliminating the need in joins.

link