Hacker News new | ask | show | jobs
by jjguy 2723 days ago
Graph databases are the NoSQL of this half decade. Move cautiously. Just because you conceptualize it in your mental model does not mean you need a graph database. Further, recognize most (all?) implementations are not yet as performant or scalable as traditional data storage solutions.

Design your data schema first, then design your queries and finally your data lifecycle pipeline. Run some estimates on the order of magnitude for inserts, query rates, query types and storage sizes - then compare those numbers to the real-world perf of the various graphdb solutions. In general, compared to more typical solutions, you have more expensive inserts, query costs and storage sizes in exchange for more expressive queries. There aren't many application where those cost tradeoffs make sense.

Source: Twice now (2012 and 2018) I've reviewed available graphdbs for storage of enterprise security data when doing the initial platform technology selection. Both times the team fell back onto more traditional approaches.

6 comments

I agree completely with this. move cautiously. I personally found the entire space very immature.

neo4j is the most mature solution I found (in the Java space). if you want to use something else go for it, but you may be surprised at the low quality.

op: I strongly recommend implementing most/all of your pipeline using graph & non-graph approaches. choose the graph approach iff you can demonstrate with hard evidence that it makes sense.

+1. We take for granted the maturity of RDB systems, but it makes for a stark comparison to GraphDBs.

Calculating over or walking over graphs sucks because there is usually a better, less brute way for any particular query.

Unless you have a set of use cases that require the ability to query across near enough random and unindexable subsets of a graph (eg Facebook), you’ll probably be better off with a DB and a spot of flattening.

Same here, I've gotten used to moving thousands of records a second even with MySQL's InnoDB on spinning metal. Then tried Neo4j and, I think, one other software—and that was the end of my experience with graph dbs.

If I were interested in them again now, I would try new and fancy solutions first, to see if there are nosql-level performance improvements in the graph db space.

> In general, compared to more typical solutions, you have more expensive ... query costs

Not to detract from your general point, but curious whether you looked at Dgraph in your analysis. It's quite fast and was built for speed.

https://dgraph.io/

> Just because you conceptualize it in your mental model does not mean you need a graph database.

Yes! When I was younger I worked on a problem once that needed to compute some very basic graph metrics. My seniors tried to do the work in an early graph database and it was a disaster. It turns out literally just reading in the lines from a file and counting things got the job done in a few seconds.

They refused to use the results until they were coming out of the graph database because "just in case we needed other metrics". We never needed the other metrics.

Storage is very cheap these days.