Hacker News new | ask | show | jobs
by kraemate 1966 days ago
I have always wondered what common usecases such systems address. How large are the graphs typically? Are graphs really that big that they dont fit in a single machine?
3 comments

When you are dealing with payments data x paying to y and txn volumes of few months then they are very large graphs, companies such as npci or alipay deals with these kinds of data.

Some of the usecase to build such graph is to get node embeddeding for fraud prevention or link prediction etc.

Indeed.. link prediction and fraud detections are two of many things we support internally in Alibaba.

Transaction graphs can be huge. Web graphs are huge. Also there are also many other huge graphs and use cases, like spot irregularities / intrusions from network traffic graphs in Alibaba Cloud. Some bioinformation algorithms also requires the ability to process big graphs.

GraphScope addresses on computations on extermely large graphs, e.g., the friendship networks of all facebook users, the hyperlink relations between all webpages around the world, and so on.

Typically the graph may have billions of nodes and 10x billions of edges. Obviously the graph data cannot be fit into a single machine to run alogrithms like SSSP or pagerank. And a single machine usually doesn't have enough cores for the computation, e.g., an interactive query couldn't return within milliseconds. That why we need distributed graph processing system for such big graphs.

Way back when I was working on parallel graph engines, trillions of edges was pretty typical. The kind of data models that produce graphs that large are pretty diverse.