| It's useful to separate graph-shaped problems from graph-db-shaped problems. From what we (Graphistry) see when working with folks here: 1. Graph-shaped, and generally fine without a graph DB: * You / your app wants to run some graph algorithms, it fits in CPU/GPU memory, you have the data elsewhere, and it's easily stitched into a graph. We regularly do 1000-1B nodes/edges on one GPU node. SQL/CSV/Parquet/Splunk/Spark query -> node+edge table -> ... . Ex: Correlating user journeys, mapping host/network IT/security log activity, analyzing bots, ... . * You want to visually explore ^^^^ as graphs/relationships/correlations (where we often come in for Graphistry) Having to manage 2 systems of record for some data to get some algorithmic/usability benefits is terrible, so often I recommend your regular DB + on-the-fly graph compute like ^^^^ . The upcoming security session of LearnRAPIDS.com will walk through some of this. 2. Graph search + graph enrichment, esp. on heterogeneous data or on > 1B nodes/edges. 2a. Graph query languages provide genericity not seen in normal SQL/NoSQL.
Ex: An analyst or an ML algorithm wants to get a 360 on all data associated with some value, maybe a couple hops out. There may be many types of data available. In SQL/NoSQL land, you need to know all the ways to pivot ahead of time (Users.id -> Customers.user_id --phone--> Calls.phone), and pray that the Join queries don't tank the system either as one-off queries or in throughput scenarios. 2b. Graph DB impls can efficiently run certain search queries other DBs cannot. When your searches have extra fun patterns, like "between user A and user B, find all paths", and "Process A talks to Process B, which creates File C, which ...", this can be a big deal. Growing in # of Tables or # Rows both make these more important. 3. Graph management, whole-graph analytics, write-heavy * DB management can be good for auth & locked schema reasons even early on; part of why we did Neo4j early for ProjectDomino.org * When working set sizes do start hitting say 100M or 1B, you may have a variety of queries where you don't want the overheads of going from scratch for everything (#1), esp. in a multi-user/service arch. * Likewise, when data grows to multi-node & write-heavy, you may want it always on. An ephemeral system can be good (no state!), but if writes are needed to and you don't want 2 systems, a graph db may be a good system. We get involved in all 3 categories of graph projects, am happy to help. |