Hacker News new | ask | show | jobs
by josephg 841 days ago
Sounds like the makings of a huge library that I’m not sure I’d even use in my work. I use graphs heavily in my work, and my experience matches the people the author interviewed.

We always end up reimplementing graphs because:

- Performance matters, and no off the shelf graph library I’ve seen can take advantage of many of the regularities in our particular data set. (We have an append-only DAG which we can internally run-length encode because almost all nodes just have an edge pointing to the last added item).

- I haven’t seen any generic graph library which supports the specific queries I need to make on my graphs. The big one is a subgraph diffing function.

- Writing something custom just isn’t much work anyway! Graphs are way simpler to reimplement than btrees. You can have a simple graph implementation in tens of lines. Our highly optimised library - with all the supporting algorithms - is still only a few hundred lines of code.

I think it would be handy to have ways to export the data into some standard format. But eh. I think pulling a library in for our use case would add more problems than it would solve.

1 comments

What do you mean by “subgraph diffing”? I work with graphs a lot and use SQL almost all the time. Sometimes I need to compute connected components with python.
I have DAG. I can then define a proper subgraph from some set of nodes {A, B, C, ...} such that the subgraph contains all transitive dependencies of that set of nodes. Given two sets of nodes X and Y, I want the set difference between the subgraphs of nodes defined by X and Y (and all of their transitive dependencies). So, what nodes exist in the subgraph of X but not in Y, and vice versa?

Ie, if we have the graph { A -> B, A -> C } then the diff between {A} and {C} is ({}, {C}). And the diff between {B} and {C} is... well, ({B}, {C}).