Hacker News new | ask | show | jobs
by tikhonj 844 days ago
FGL is a great example of how to make a "nice" high-level graph interface suited for functional programming. I'm a big fan. But it's orders of magnitude too slow and memory-inefficient for performance-sensitive graph computations—if you have even moderately sized graphs and graph algorithms are a bottleneck, you'll need to use something else, and probably something domain-specific. Given the way the interface works, I don't think you could have a high-performance version that would scale to large graphs.

In my experience this leaves FGL in an awkward spot: on the one hand, it isn't sufficient for heavy-duty graph processing; on the other, if you don't need fancy high-performance graph algorithms, chance are that encoding your problem as a graph is going to be more awkward than defining some domain-specific types for what you're doing. Graphs are such a general structure that they're usually the wrong level of abstraction for higher-level domain-specific logic.

Of course, sometimes you're writing graph code specifically and you need a nice way to express your graph algorithms without worrying about performance. In that case, FGL is great. I wrote a tutorial about using it to [generate mazes][1] and it helped me express the algorithms better than I would have been able to do without it. But that still leaves it as too narrow for something to be "the" graph representation in a language's standard library.

[1]: https://jelv.is/blog/Generating-Mazes-with-Inductive-Graphs/

2 comments

But it's orders of magnitude too slow and memory-inefficient for performance-sensitive graph computations—if you have even moderately sized graphs and graph algorithms are a bottleneck, you'll need to use something else, and probably something domain-specific.

This seems a little pessimistic to me. There are plenty of application domains that can be conveniently represented using graphs where you might have thousands of nodes and edges — which is what I’d characterise as “moderately sized” — and your needs might only extend to relatively simple and efficient graph algorithms. FGL is excellent in this kind of scenario.

If you do need the kind of algorithms that explode in complexity then even a representation a couple of orders of magnitude more efficient won’t help you much either. Big-O is the thing that is going to spoil your day in this story, not the constant factor. Some problems simply don’t have convenient fast solutions and ideally with those you find a way to change the representation so the original problem doesn’t arise in the first place.

It’s true that there’s also a zone where you have significantly larger graphs but still only need computationally tractable algorithms, and in that case the overheads of a library like FGL become a factor in what is viable. I also don’t disagree with you (and Hillel in the original piece) that it would be difficult to define comprehensive graph functionality to include in a standard library when there are so many different trade-offs involved.

A good — and not entirely unconnected — analogy might be calculating with matrices. It’s convenient to have support for simple but widely useful cases like 3x3 and 4x4 built into your language or standard library. However, once you’re solving systems with hundreds or thousands of rows, you probably want more specialised tools like BLAS/LAPACK, and the structure of your matrix and how you can decompose it start to matter a lot more.

> you probably want more specialised tools like BLAS/LAPACK

The GraphBLAS and LAGraph are sparse matrix optimized libraries for this exact purpose:

https://github.com/DrTimothyAldenDavis/GraphBLAS

https://github.com/GraphBLAS/LAGraph/

Interesting. Under the hood FGL is mapping the graph to relatively efficient data structures like Patricia Trees as implemented in Data.IntMap so I would expect it to scale reasonably for inserting edges and mapping over nodes. I agree the memory inefficiency is definitely a limiting factor of the library. As you say I think it is best suited for expressing graph algorithms and if those calculations become the bottle neck a custom solution can be developed with the proof of concept already in place. Out of curiosity what number of nodes/edges would you consider a "moderately sized graph"? My user cases are typically on the order of 500-1000 nodes with a similar number of edges that require bfs and topological sorting.
I don't have an exact number in mind—I imagine it's pretty context-specific—but 500–1000 nodes seems like it would qualify.

I've played around with IntMap before and it's not a great data structure. It's a binary Patricia trie, which means that you quickly get a relatively deep tree with lots of pointer traversals. Unless I've managed to confuse myself on how it works, you'd end up with, what, at least 10 traversals to look up a value from 1000 keys?