Hacker News new | ask | show | jobs
by zwaps 1302 days ago
It seems you are also counting the filling of nx digraph from memgraph. You use a list op to add nodes, then python iterate over a list of nodes and use the nx list operation to add all edges.

And then you dont even use the numpy or scipy implementation of networkx. If inmemory, benchmark networkx by loading a sp matrix and use the numpy pagerank and only time that?

Like, it’s not as if anyone uses networkx for performance, so dunking on it for that is probably not as good of a marketing post as it seems.

But then also check your implementation bc this surely is the slowest way to use networkx and then having only five times speed up seems little. Doesn’t igraph or julia beat properly implemented networkx by much more?

Look, if it weren’t an advertisement I would not say anything, but it seems you compare your new performance car with the networkx family van, which is also maybe filled with concrete.

1 comments

I considered different ways of comparison here, and decided to go with a simple comparison on sample dataset, just to get a feel of it. I did consider doing it all in Python, but then it’s not fair towards Memgraph. Also, it depends on the query we are performing. I could have run a much more complicated query which would give better results, but then again it wouldn’t be fair. If I removed the time counted for filling the digraph, then just the pure algorithm time would be calculated, and the main difference between NetworkX and Memgraph is that Memgraph offers persistance, while NetworkX always has to load the graph into memory. It can be further discussed what would be the best way to do a true benchmark and on what kind of dataset. I did not go into details of the graph type here, but there are for sure cases where Memgraph outperforms NetworkX on much higher scale and on certain graph types. I didn’t claim that we are 5 times faster in any case, just in this certain case. When I do a proper benchmark in the future, I will make sure to be as fair as possible to both sides, and of course to showcase better when to use Memgraph, and when NetworkX, since it all depends on your needs.

Also, thanks for reading it, it means a lot to hear such comment. I get to learn from it too :)

Firstly, congratulations for working on a very interesting project and thanks for the insights.

Indeed, there are some points in your reply which I think would fit better to a networkx vs. memgraph comparison post.

That being said, your blogpost is titled "Who ranks better?" and it is mainly about the speed of running a PageRank.

Networkx is a no frills Python package that is much easier to use and experiment on. Outperforming networkx in speed is not really a feat, however any new network package should certainly do this. And further, do this with networkx on equal footing. For instance, igraph outperforms networkX in pageRank by 20 times, and graph-tool by over 50 times (without load times)!

And I am sure memgraph can do so as well, just that this blog post doesn't seem to conclusively demonstrate that fact.

It would make little sense to me to use networkx as a tool to load data from memgraph. And to be honest, using this triple (quadruple) Python list operation and not even use the numpy-based performance that networkx does offer (little as it may be) just doesn't seem right.

I understand that memgraph has other advantages, like persistence, however in that case networkX is simply not a good comparison. If that's the focus, why not query a local Neo4j? That's gonna be a pretty speedy PageRank as well and an interesting challenge.

All in all, I am sure Memgraph performs great, and I am looking forward to other comparisons in the future!

Who ranks better was my word play, because of PageRank :') But yes, I totally agree with what you wrote and I will aim for more detailed comparisons in the next articles. And you are right, NetworkX is easy to use Python library, and I wanted to show that Memgraph is also easy to use and Python friendly, as well as fast. It also has a bunch of popular graph algorithms already implemented, so if anyone is working with graphs and NetworkX, the performance gain of Memgraph introduced in this article may be useful to them. Regarding Neo4j, wait for it ;)