Hacker News new | ask | show | jobs
by gcv 6195 days ago
Fascinating. I haven't finished reading the entire presentation yet, but I already stumbled on something.

The author lists a drawback of the first antipattern ("metadata tribbles"): a table split into multiples to keep its size down requires tricky querying, specifically, it requires a union across all splits (slide 10). It also requires extra effort to keep table structures synchronized (slide 11). Then, in a solution to his second antipattern ("entity-attribute-value"), he suggests doing something quite similar to his first antipattern: creating multiple tables with nearly identical columns and using a union to query across them (slides 27 and 28).

Hello? So to resolve antipattern 2 you just apply antipattern 1? This is why precisely why relational databases for most problems look like square pegs being hammered into round holes.

[EDIT] Not sure if it's worth finishing reading this presentation. Slide 56, a "solution" to the problem of storing a hierarchy in a relational database. Breadcrumbs. Not bad, except that the query now requires a "like" clause. Whoops, can't use an index on that column anymore! (Feel free to correct me if there exists a RDBMS which can use an index on a "like.") Hope the author doesn't mind a nice table scan on that query. He doesn't even mention this problem.

5 comments

Anti-pattern one was critiquing using values in table definitions, the second was using attributes in table definitions. These are not the same thing. There isn't a similarity between values and attributes in the relational model, which is the whole point of avoiding key-value strategies which are confusing an attribute with a value.

A big part of why the modelling of data that you don't know the shape of in advance in a relational database is hard is because reasoning about data you don't know in advance is hard, and the model exposes that difficulty up front.

(Also, if like has its wildcard at the end of a string not the beginning, indexes can help you as they can look into the sorted trees.)

> Feel free to correct me if there exists a RDBMS which can use an index on a "like."

huh? mysql does that for 'like' queries which don't start with a wildcard (that's exactly what you need for breadcrumbed trees).

also (i didn't read TFA, though) you don't have to use 'like'. you can use BETWEEN or just "path >= 'a' and path < 'b'", which, i suppose, uses index in even the most stupid RDBMS in the world).

prooflink log from my live real production database with a breadcrumbed tree: http://pastie.org/541611

Well, certainly in Oracle, it can use an index on a like if it is structured:

    where col like 'str%'
However, doing

   where col like '%str'
Means it cannot use any index. So in the example given, it would be able to use an index because there is a leading 'path' before the %, assuming MySQL can do the same thing.

In Oracle you can do hierarchy query's using the 'connect by' syntax, which means you can structure your table with id, parent_id and get all the rows out efficiently in a single piece of SQL, which is nice!

You use a database that supports CTE or at the very least, recursive querying options, then you format your schema like so:

  Comments:
  /-----------------------------\
  |   ID   | ParentID |   etc   |
  |-----------------------------|
  |    1   |   Null   |   etc   |
  |    2   |   Null   |   etc   |
  |    3   |     1    |   etc   |
  |    4   |     1    |   etc   |
  |    5   |     4    |   etc   |
  \-----------------------------/
You then use a Common Table Expressions query, like this:

  WITH CommentTree (ParentID, ID, Level)
  AS
  (
    SELECT ParentID, ID, 0 as [Level], 
        FROM Comments
        WHERE ParentID = [ROOT COMMENT ID] OR ParentID IS NULL
    UNION ALL
        SELECT c.ParentID, c.ID, ct.Level + 1
            FROM Comments c
            JOIN CommentTree ct ON ct.ID = c.ParentID
  )
  -- Now actually query the data from the CTE Expression
  SELECT ParentID, ID, Level FROM CommentTree 
     ORDER BY ParentID, Level, ID
This returns a table with a calculated column called Level.

  /---------------------------\
  | ParentID | Level |   ID   |
  |---------------------------|
  |   Null   |   0   |   1    |
  |   Null   |   0   |   2    |
  |    1     |   1   |   3    |
  |    1     |   1   |   4    |
  |    4     |   2   |   5    |
  \---------------------------/
Which of course, would represent this tree:

       Root
      /    \
     2      1
           / \
          3   4
               \
                5
The like search will still be faster than hitting the db repeatedly to get each node's children which is how most people seem to implement retrieving a tree.

This "path" method is straight out of SQL for Smarties - Hierarchies & Trees" and though the nested set method is probably better for most uses, the path column has its uses.

There's a few tricks you can do with Oracle function based indexes to optimize out like clause in this case. Can't speak for other RDBMS but I imagine they have similar facilities.