| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by skafoi 1774 days ago

Regarding the performance on neo4j: the challenge for an honest and fair test towards this would be about how to properly compare a server-based solution vs. a serverless solution. TiloDB automatically scales up and down without any further interaction due to using Lambdas for all calculations. So would you compare it with a relatively small neo4j instance or with a large cluster? I honestly don't know. When we started doing this internally for our previous company, we obviously did test the edge cases. After all, we didn't want to create our own solution. For graph databases, the issue here are either a lot of edges leaving from one node or a long chain of edges. The first scenario was still handled okish: response times of around 6 seconds for 1.000 nodes if I remember correctly. The second scenario was a total fail. The problem for the later one lies in the transitivity as a graph database has to jump from one node to the next one and so on. To be fair though: when it comes to dynamic entities, so choosing which rules are relevant, graph databases might be the better choice - especially when response times don't matter.

The response times provided in the article are for the whole process of searching and returning the entity. The indexes themself are obviously a lot faster - to be precice we are using DynamoDB for storing the indexes, which most times return results in <10ms. Compared to other databases this may still sound slow, but we know that we won't run into scaling issues in this way and that's kind of what matters currently most for us.

Hope that somehow makes sense what I wrote.

1 comments

lmeyerov 1774 days ago

Separate benchmark per claim and core use case :) A scale-to-zero + autoscaling graph db could be both broadly relevant and differentiated, so I'd be curious there + table stakes for regular queries.

RE:extremes, we see graph DBs OK for small time series (ex: 2 nodes with a bunch of event multiedges), but not full blown time series... where we'd use a tsdb. Some vendors demo this, but always felt like wrong tool.

The many-hop case is interesting! We don't see 1K-hops typically, and I get nervous even at 10-20 on graph DBs we've used. I can imagine in logistics or sciences that happening more, or maybe even some rdf systems. Partition keys start mattering fast, whether a kvdb or a mpp, but I don't have an intuition here. Probably easier to differentiate on, but too niche?

link

skafoi 1774 days ago

Thanks for your input.

1k hops is also not something we see on a regular basis in our old business, which is much about people moving houses and transactional data from payment service providers. Ppl with money issues seem to move a lot more often and also fraud cases often have a lot of hops.

link