|
|
|
|
|
by emileifrem
2785 days ago
|
|
Indeed, and Neo4j has scaled out horizontally for reads since 2011. However, the really hard problem with scaling graphs is scaling writes, i.e. partitioning. In my mind, no one has solved that well today because you can't just slap it on top of a partitioning algorithm that's designed for data without explicit relationships (think documents, key value pairs). In order to gain something other than a checkbox feature and "sharding claims" you have to partition based on the shape of the graph at the time of insertion, but also revise it continuously as the graph evolves over time. That's a non-trivial problem that no one has solved today. (Yes, we're obviously working on it here at Neo4j.) The good news is that you can get really really far with the replicated horizontal scale out model. "Big RAM is growing faster than big data" as you've probably heard and today there are massive Neo4j deployments in production using our third generation scale out architecture (Raft based, multi clustering, causal consistency). |
|
I don't understand this comment.
Last year, the MIT team showed the GraphBLAS/D4M model can achieve 100M inserts per second on a cluster [1], and it's improved since then.
The GraphBLAS [2] standard has been in the works for more than 10 years. It's the culmination of the initial D4M matrix model design by Jeremy Kepner [3] and his team at MIT Lincoln Laboratory Supercomputing Center, which models graphs in the language of linear algebra.
And for the last ~5 years or so the GraphBLAS software model has been designed in collaboration with hardware teams at Intel, NVIDIA, IBM, and the labs to make chips and architectures optimized for these new matrix models and capable of scaling out to exascale [4].
From a query perspective the GraphBLAS model is even better now that GPU and TPU accelerators are populating the data centers since it means you can now run local graph queries and global graph analytics algorithms on the same system and both return results in less than a second.
* Google Cloud GPUs https://cloud.google.com/gpu/
* Google Cloud TPUs https://cloud.google.com/tpu/
For an overview of GraphBLAS in the context Heterogeneous High-Performance Computing (HHPC) systems running on NVIDIA GPUs and Intel Xeon Phis, see the 2015 talk Scott McMillan [5] gave at the CMU Software Engineering Institute [6].
GraphBLAS is in RedisGraph now [7] -- it uses the official GraphBLAS C implementation written by Tim Davis [8], who as you know implements the underlying sparse matrix algos used in everything from MATLAB to Google Maps -- see his recent talk:
* RedisGraph in the Language of Linear Algebra with GraphBLAS, co-presented by RedisLabs and Tim Davis [video] https://www.youtube.com/watch?v=xnez6tloNSQ
Adoption of the GraphBLAS standard by hardware chip manufacturers is a sign that the golden-age of graphs is upon us -- the the graph hardware/software model has effectively been solved, and the preceding wave created by AI/deep-learning demand led the way for the emergence of GPU/TPU accelerators in the cloud -- all the forces have aligned.
But for whatever reason, I haven't heard anything about Neo4j in terms of GraphBLAS. What's Neo4j's official position on this and the adoption of the GraphBLAS standard?
I realize it's a big change and would require a big architectural overhaul, but it's been in the works for 10 years, and most of the vendors have been involved for the last 5 years.
Has Neo4j been involved in the GraphBLAS design process and/or are you moving toward adopting the standard?
---//---
[1] Achieving 100M database inserts per second using Apache Accumulo and D4M [pdf] http://www.ieee-hpec.org/2014/CD/index_htm_files/FinalPapers...
Previous Discussion: https://news.ycombinator.com/item?id=13465141
[2] GraphBLAS Standard http://graphblas.org
[3] Jeremy Kepner http://www.mit.edu/~kepner/
[4] GraphBLAS: Building Blocks For High Performance Graph Analytics https://crd.lbl.gov/news-and-publications/news/2017/graphbla...
[5] Scott McMillan https://insights.sei.cmu.edu/author/scott-mcmillan/
[6] Graph Algorithms on Future Architectures [video] https://www.youtube.com/watch?v=-sIdS4cz7-4
[7] RedisGraph https://oss.redislabs.com/redisgraph/
[8] Tim Davis http://faculty.cse.tamu.edu/davis/
Previous discussion: https://news.ycombinator.com/item?id=18081978