|
|
|
|
|
by batmansmk
2002 days ago
|
|
It really depends on what you want to do with it. I would benchmark the tasks "traversal", "aggregation" and "shortest past" for a 10k to 10M node graph.
Anything under 10k would be good enough with most techs and over 10M need to consider more tasks (writes, backup, the precise fields queried can become their particular problems at larger scale). The Github link implements "traversal "in Python instead of pure SQLite. I suspect it will be around x10 slower than it could be with the same tech stack, because it queries once per node from Python to SQLite.
Shortest path is not implemented and would be too slow to be useful in an interactive environment. "Aggregation" is also not implemented, but it would perform admirably, because SQL is good at that. Traditional relational OLTP databases such as Postgres are already faster than dedicated graph databases for certain graph related tasks, according to this benchmark: https://www.arangodb.com/2018/02/nosql-performance-benchmark... |
|
It is indeed quite common that relational databases outperform graph databases on certain graph processing problems such as subgraph queries (a.k.a. graph pattern matching). There are two key reasons for this: (1) most graph pattern matching operations can be formulated using relational operations such as natural joins, antijoins, and outer joins; and (2) relational databases have been around longer and have well-optimized operators.
A lot of the value that graph databases provide lies in their query languages which (for most systems) allow formulating path queries using a nice syntax (unlike SQL's WITH RECURSIVE which many people find difficult to read and write). Their property graph data model supports a schema-optional approach, which makes them better suited for storing semi-structured data. They also "provide efficient programmatic access to the graph, allowing one to write arbitrary algorithms against them if needed" [1].
With all these said, graph databases could be much faster on subgraph queries than relational databases and there are recent research results on the topic (worst-case optimal joins, A+ indexes, etc.). But these are not available in any production system yet.
[1] http://wp.sigmod.org/?p=1497