Hacker News new | ask | show | jobs
by thadeus_venture 5438 days ago
Funny thing though, a well designed relational database will execute your graph queries faster. It won't do generic graph queries faster, but if your sql data model and code are optimized for the type of graph data you are storing, sql will almost always be significantly faster, orders of magnitude in some cases. Obviously not a data model argument, but a real world argument nevertheless. If you then need to scale further you will probably end up putting up the same caching layer between your database and the front end, but at least all your data will be in once place (the sql database).
1 comments

a well designed relational database will execute your graph queries faster

Why do you say that? -- see "MySQL vs. Neo4j on a Large-Scale Graph Traversal" (http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-lar...).

You can represent a graph in almost any data structure, including a relational database. But the difference between a graph database and everything else is that in a real graph database (like Neo4j), each node has an internal/local index for its adjacent nodes so it doesn't have do an external look up for each traversal step.

Watch this video on "The Graph Traversal Programming Pattern" to see what I'm talking about (http://vimeo.com/13213184).

From the article:

"However, no attempts have been made to optimize the Java VM, the SQL queries, etc"

Emphasis being on optimizing the sql. We have run tests comparing neo4j and postgres, and postgres comes out with greater throughput for our data set, where our database implementation was done people who know postgres extremely well. Where you will see especially great differences is aggregate queries, such as if you want to count the number of a certain type of connections coming into a set of nodes, and then sort these nodes by that number. A sql database is much better at stuff like that.

What were you using to query the graph?

Gremlin has significantly improved what you can do with graph aggregating and sorting:

  // count incoming friends for each node and sort by most friends
  m = [:] 
  g.V.inE('friend').outV.groupCount(m)   
  m.sort{}