Hacker News new | ask | show | jobs
by zozbot234 94 days ago
What do you need one trillion edges for? Wikidata is a huge, general purpose knowledge graph and it gets away with ~1B triples, give or take.
1 comments

Almost all analytic graphs of general scope surpass 1T edges, see below. DARPA also has an unfilled objective for 1B edge real-time continuously updated operational graphs. These are smaller and the write throughput requirements are in line with non-graph analytical databases but graph databases struggle to meet that standard.

There are countless smaller graphs for narrow domains that may be <1B edges but many people have the ambition to stitch together these narrow graphs into a larger graph. When stitching graphs together, the number of edges is usually super-linear. A billion edges is kind of considered “Hello World” for system testing.

The Semantic Web companies in the 2000s had graphs that were 100B+ edges. They wanted to go much larger but hit hard scaling walls around that point. That scaling wall killed them.

Classic mapping data models are typically 10-100B edges. These could be much, much larger if they could process all the data available to them.

Of course, intelligence agencies had all kinds of graphs far beyond trillions of edges 20 years ago. People, places, things, events.

Any type of spatiotemporal entity graphs with large geographic scope are quadrillions of edges. It isn’t just a lot of inferred relationships between entities, the relationships evolve over time which also must be captured. These are probably the most commercially valuable type of graph. You could build hundreds of different graphs of this type with 1T+ edges in most regions, never mind doing it at scale. These are so large that we usually don’t store them. Subgraphs are generated on demand, which is computationally expensive.

These spatiotemporal entity graphs also have the largest write loads. Single sources generate tens of PB/day of new edges. There is a ton of industrial data that looks like this; it isn’t just people slinging structured data.

Graphs are everywhere but we furiously avoid them because the scalability of operations over anything but severely constrained graphs is so poor. Selection bias.

NSA in particular heavily funded foundational theoretical and applied computer science research into scaling graph computing for decades. They had all kinds of boring graphs where trillions of edges was their Tuesday. The US military also uses large graph databases in fairly boring applications that probably didn’t require a graph database.