| Suppose you have a graph with edge weights. Query: find all the nodes a distance < D from some reference node. With SQL92 or earlier, this will require an unbounded number of queries. Proof: A SQL92 query can only traverse a finite number of edges, since it has only a finite number of joins (say K). Consider the graph 1->2->3->...->N with edge weights 1/N. This will require N/K queries to traverse. N is arbitrary. It's probably doable in SQL99 and various vendor specific extensions (I think it's Turing Complete), but it is unlikely to be fast. So, as is the normal situation, you can do whatever you want in SQL. But the performance is likely to suck. (That said, I agree with you that many people are using NoSQL because it's the hip new thing, when they should just use SQL. "Schemaless flexibility" is BS - SQL is very flexible if you use an ORM + migration tool. The only flexibility you lose is the flexibility to screw up the data in application logic.) |
WITH RECURSIVE r AS (SELECT parent, child, distance FROM graph UNION ALL SELECT r.parent, graph.child, r.distance + graph.distance AS distance FROM graph, r WHERE graph.parent = r.child) SELECT parent, child FROM r WHERE distance < 3.14159;
Testing your graph with N=1000 on a teeny VPS. 477,897 rows returned in 6.1s.
If we move the WHERE clause inside the sub-SELECT (which unfortunately makes the query slightly more awkward to use), selecting lower distances reduces the compute time. For example, a bound of 0.5 returns 196,238 rows in 2.5s.
What sort of performance gains does a graph database give for this same query?