| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jandrewrogers 5104 days ago

Good comment.

I was using Graph500 as a decently documented public example more than the only example. There are other problems based on real-world data in the trillion edge range that serve as "hello world" models for testing massively parallel graph algorithms. Directed and undirected, cyclic, and acyclic, properties and property-free. Semantic databases and entity analytics are popular test cases.

In the specific case of Graph500, the graph is significantly cyclic which creates coordination issues if you simply denormalize the data (e.g. replicating edges around a graph cut). Being able to do a massively parallel BFS from any vertex in the graph and producing the globally correct result without replicating edges means that you cannot know how to optimize the organization ahead of time. This was an intentional part of the benchmark design. The Graph 500 does not lend itself to optimizing for a particular set of adaptive graph cuts in any conventional sense; the algorithms used need to be general over the 64 randomized runs and that benchmark was designed to favor non-replicated edges when using massively parallel systems (the coordination costs of edge replicas will kill your performance). However, obviously the massively parallel systems are partitioning up the graph in some sense.

In the specific case of the work I did a couple years ago, the systems can ingest tens of millions of new edges per second concurrent with parallel queries (not serializable multi-statement transactions, obviously). The ingest structure can be identical to the structure against which ad hoc queries are run without any kind of query optimization. The fact that ingest rates that high are sustained effectively precludes dynamically reorganizing the data to satisfy particular queries more optimally. In truth, it could be made more optimal for batch-y type workloads (maybe 2-3x faster versus the dynamic version?) but the point was to be able to throw massive amounts of hardware at arbitrary graph data models rather than optimizing it for a specific query.

BTW, metric space embedding is non-trivial algorithmically but can also be computationally inexpensive. The Macbook Air I am using now can do tens of millions of those embedding operations per second on a single core for moderately complex spaces and data models. Maybe an order of magnitude or two slower if dealing with complex, non-Euclidean geometries. However, I also spent a couple years of computer science R&D developing the algorithms to make that fast. :-) I have been working in this particular area for a bit over half a decade now so my perspective takes some things for granted I think. There isn't just one problem you have to solve, there are actually several if you are starting from scratch.

Like I said, I didn't want to take anything away from Titan and true OLTP-oriented systems have their own complex problems, not the least of which is that they don't scale too far beyond a couple hundred compute nodes for the current state-of-the-art. Not my specialty. I work in a world of more basic consistency guarantees.

Cheers!