Hacker News new | ask | show | jobs
by espeed 5438 days ago
a well designed relational database will execute your graph queries faster

Why do you say that? -- see "MySQL vs. Neo4j on a Large-Scale Graph Traversal" (http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-lar...).

You can represent a graph in almost any data structure, including a relational database. But the difference between a graph database and everything else is that in a real graph database (like Neo4j), each node has an internal/local index for its adjacent nodes so it doesn't have do an external look up for each traversal step.

Watch this video on "The Graph Traversal Programming Pattern" to see what I'm talking about (http://vimeo.com/13213184).

1 comments

From the article:

"However, no attempts have been made to optimize the Java VM, the SQL queries, etc"

Emphasis being on optimizing the sql. We have run tests comparing neo4j and postgres, and postgres comes out with greater throughput for our data set, where our database implementation was done people who know postgres extremely well. Where you will see especially great differences is aggregate queries, such as if you want to count the number of a certain type of connections coming into a set of nodes, and then sort these nodes by that number. A sql database is much better at stuff like that.

What were you using to query the graph?

Gremlin has significantly improved what you can do with graph aggregating and sorting:

  // count incoming friends for each node and sort by most friends
  m = [:] 
  g.V.inE('friend').outV.groupCount(m)   
  m.sort{}