| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fatherzine 841 days ago

What an awesome article. Kudos to the author!

On the core observation "there are too many implementation choices", that is not quite right. True, the author mentions 4, and there are further variations. In practice, a library can:

1. Implement all suitable graph representations.

2. Implement algorithms tailored to the representation(s) that offer the highest performance.

3. Provide transformations from one representation to another. This is O(#representations), trivial to implement and trivial to use. Quite fair workload for both maintainers and users.

4. Bonus, provide import / export transformations from / to common standard library datatypes and idioms.

Memory and transformations are cheap, 99% of use-cases would likely find the overhead of transforming data, both in RAM and CPU, negligible.

Edit: "the harsh truth of working at Google is that in the end you are moving protobufs from one place to another." -- https://news.ycombinator.com/item?id=20132880

1 comments

josephg 841 days ago

Sounds like the makings of a huge library that I’m not sure I’d even use in my work. I use graphs heavily in my work, and my experience matches the people the author interviewed.

We always end up reimplementing graphs because:

- Performance matters, and no off the shelf graph library I’ve seen can take advantage of many of the regularities in our particular data set. (We have an append-only DAG which we can internally run-length encode because almost all nodes just have an edge pointing to the last added item).

- I haven’t seen any generic graph library which supports the specific queries I need to make on my graphs. The big one is a subgraph diffing function.

- Writing something custom just isn’t much work anyway! Graphs are way simpler to reimplement than btrees. You can have a simple graph implementation in tens of lines. Our highly optimised library - with all the supporting algorithms - is still only a few hundred lines of code.

I think it would be handy to have ways to export the data into some standard format. But eh. I think pulling a library in for our use case would add more problems than it would solve.

link

therobots927 840 days ago

What do you mean by “subgraph diffing”? I work with graphs a lot and use SQL almost all the time. Sometimes I need to compute connected components with python.

link

josephg 839 days ago

I have DAG. I can then define a proper subgraph from some set of nodes {A, B, C, ...} such that the subgraph contains all transitive dependencies of that set of nodes. Given two sets of nodes X and Y, I want the set difference between the subgraphs of nodes defined by X and Y (and all of their transitive dependencies). So, what nodes exist in the subgraph of X but not in Y, and vice versa?

Ie, if we have the graph { A -> B, A -> C } then the diff between {A} and {C} is ({}, {C}). And the diff between {B} and {C} is... well, ({B}, {C}).

link