| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lmeyerov 2203 days ago

It's useful to separate graph-shaped problems from graph-db-shaped problems. From what we (Graphistry) see when working with folks here:

1. Graph-shaped, and generally fine without a graph DB:

* You / your app wants to run some graph algorithms, it fits in CPU/GPU memory, you have the data elsewhere, and it's easily stitched into a graph. We regularly do 1000-1B nodes/edges on one GPU node. SQL/CSV/Parquet/Splunk/Spark query -> node+edge table -> ... . Ex: Correlating user journeys, mapping host/network IT/security log activity, analyzing bots, ... .

* You want to visually explore ^^^^ as graphs/relationships/correlations (where we often come in for Graphistry)

Having to manage 2 systems of record for some data to get some algorithmic/usability benefits is terrible, so often I recommend your regular DB + on-the-fly graph compute like ^^^^ .

The upcoming security session of LearnRAPIDS.com will walk through some of this.

2. Graph search + graph enrichment, esp. on heterogeneous data or on > 1B nodes/edges.

2a. Graph query languages provide genericity not seen in normal SQL/NoSQL. Ex: An analyst or an ML algorithm wants to get a 360 on all data associated with some value, maybe a couple hops out. There may be many types of data available. In SQL/NoSQL land, you need to know all the ways to pivot ahead of time (Users.id -> Customers.user_id --phone--> Calls.phone), and pray that the Join queries don't tank the system either as one-off queries or in throughput scenarios.

2b. Graph DB impls can efficiently run certain search queries other DBs cannot. When your searches have extra fun patterns, like "between user A and user B, find all paths", and "Process A talks to Process B, which creates File C, which ...", this can be a big deal.

Growing in # of Tables or # Rows both make these more important.

3. Graph management, whole-graph analytics, write-heavy

* DB management can be good for auth & locked schema reasons even early on; part of why we did Neo4j early for ProjectDomino.org

* When working set sizes do start hitting say 100M or 1B, you may have a variety of queries where you don't want the overheads of going from scratch for everything (#1), esp. in a multi-user/service arch.

* Likewise, when data grows to multi-node & write-heavy, you may want it always on. An ephemeral system can be good (no state!), but if writes are needed to and you don't want 2 systems, a graph db may be a good system.

We get involved in all 3 categories of graph projects, am happy to help.