Hacker News new | ask | show | jobs
by junon 1248 days ago
Yeah that was strange, it's my understanding that you can't compare benchmarks between different machines, especially if they're not 1:1 identical hardware.

If you're referring to this line, then it struct me as very odd.

> Instead of 112 queries per second, I get 531q/s. Instead of a p99 latency of 94.49ms, I get 28ms with a min, mean, p50, p75 and p95 of 14ms to 18ms. Alright, what about query 2? Same story.

Otherwise, the article holds up.

3 comments

It was...

> It looks like Neo4j is faster than Memgraph in the Aggregate queries by about 3 times. Memgraph is faster than Neo4j for the queries they selected by about 2-3x except...

Unless that's meant to be a joke? Maybe they were dunking on the "bullshit" benchmark with a worse comparison.

Oh, then I'm not sure what you mean. That line makes sense - the final benchmarks were performed on the author's machine. That's where the conclusion comes from.

In theory, the spread between two benchmarked programs is not going to be hugely different between machines unless one is taking advantage of the hardware of one of the machines where the other doesn't (e.g. new syscall mechanisms such as io_uring, SIMD support, multithreading in some cases, etc).

2-3x is a much more reasonable spread than 100x. If they really did have 100x speedups, then the culprit may be the fact they're using obscenely old hardware, which would be disingenuous given that people are not typically running graph databases on such infrastructure anymore.

> the final benchmarks were performed on the author's machine

No, the author didn't run any benchmarks for Memgraph AFAICT, only for Neo4j. The numbers for mempgraph at the end are from the old benchmark, so on the old hardware.

I don't think that's right. That's not how I understood the article.
The author published the code and I only see adapters which interface with Neo4J: https://github.com/maxdemarzi/memgraph_benchmark/tree/main/s...

The numbers for Memgraph match what is shown on their benchmark website.

The author confirmed only neo4j tests in a follow up comment.
I think that part is still fine, because there he's only saying that he got different results for the same test on his hardware, which might help to set a baseline. Really weird is the table after "Let’s see the breakdown". It's not super clearly labeled so I'm not fully sure which data is which, but it looks he's comparing neo4j on his machine to memgraph on their older hardware, that would be very silly. Looking at the source for the benchmark, that also seems to hold.
Author is just stating the differences between the benchmarketing hardware and his own. Not comparing new hardware and one DB with old hardware and other DB.
The bottom table contains the memgraph mgBench (G6 2x Xeon X5650) "hot run, medium, isolated" throughput results:

https://memgraph.com/benchgraph/base?condition=hot&datasetSi...

The "By" columns compares those results to the new test suite on the ~10y newer cpu.

But those are needless comparisons that make no sense to even mention. Of course the performance profile is different. That was the original point, if I understand correctly.

EDIT: GP followed up, I did not in fact understand correctly.

Seems like he does in his conclusion:

> It looks like Neo4j is faster than Memgraph in the Aggregate queries by about 3 times.

Having re-read it, I now can't decide. It would be a little silly if the author is completely different devices, so I'm going to stick to that interpretation.