Hacker News new | ask | show | jobs
by aristidb 5585 days ago
Nice! I did not know about this feature. How efficient is it, and which databases support it? :-)
1 comments

It is SQL standard. I use PostgreSQL, and I'm pretty sure Oracle supports it. MySQL doesn't seem to.

Performance wise it's not magical; it quite literally repeatedly scans the database, broadening the search tree until there is no fringe. On a table I have with nearly exactly that structure of about 300 rows and limited nesting depth, the transitive closure takes 4 ms on a crappy VPS.

Of course I'm sure graph databases are somehow caching the results or performing some other optimization; you can achieve the same effect in RDBMSes with materialized views (which just cache the results of this query & keep it updated). Oracle natively supports materialized views; you can hack it easily in PostgreSQL and MySQL (though it should really be built in).

In a connected graph, you don't want to cache this [1]. It will take up O(N^2) space. If you have 100k nodes, that would require 10^10 cached edges.

Further, insertion time becomes worst case O(N^2). Consider a graph with two connected components (each having O(N/2) nodes). Once you add an edge that connects them, updating the view will require O(N^2/4) operations - you need to connect every node in the first component to every node in the second.

A graph database is more specialized than SQL, so they can perform a very simple optimization. You don't scan the database, you just follow links. Links are typically stored with the node, so it's O(1) time.

[1] In a connected graph, there is a constant time algorithm - return True. But that won't work if you also want edge distance.

You're right about caching. I wasn't sure if graph databases took the typical NoSQL approach of massive non-normalized tables to avoid computational overhead.

SQL doesn't scan either -- tables are indexed via either a hash (O(1)) or a tree (O(log edges)). While I do realize the speed advantage of a specialized structure, I feel this could be better implemented as a specialized index/storage mechanism in SQL than as an entirely new database system.