| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mumblemumble 2460 days ago
	Yes. And it may even be the best way to do it. For example, here's a paper where the authors come up with a schema and transpiler for doing a Gremlin-queryable graph DB in PostgreSQL, and find that it outperforms Neo4j and Titan: https://static.googleusercontent.com/media/research.google.c...

2 comments

gbear0 2460 days ago

That's interesting, but kinda makes sense since it would be optimizing specific access patterns by translating it to relational models rather than using the normal graph walking algs to find relations.

As an anecdote for one project, while trying to speed up some neo4j queries myself, I decided to model a binary tree structure in the nodes (child/parent relations) and then compared the query times for using the simple cypher queries vs cypher queries with some embedded lib functions that would walk the tree exactly the way I wanted. The times were much faster for something that I hadn't even optimized much code wise.

The test got me thinking that if I could have a way of declaring more info about how the relationships are related, then maybe we could automatically have the db use more appropriate algs for a more appropriate data structure for certain node types. I think that's similar to what's happening here, it's automatically mapping out the simple graph relations to structured relational db tables. I hope in the future we'll be able to provide more input to that as well (or at least I haven't seen something like this yet). Let me annotate my schema to specify my parent/child relationship as a tree, or my word map nodes form a trie but the leaves should be some other type. Why can't we think of a db as having multiple datatypes beyond just a kv-store, or table, or graph?

link

mumblemumble 2460 days ago

It's been a while since I read the paper in detail, but, IIRC, it _is_ using normal graph walking algorithms. They're just implemented in SQL.

That it's implemented on top of a relational database seems like a red herring to me. The relational model just defines operations on sets of tuples. A graph is just a particular kind of thing you can construct with sets and tuples.

From there, the query planner and execution engine take over, and an incumbent RDBMS's query planner and execution engine are supported by decades and decades worth of accumulated dark knowledge on how to optimize execution plans and efficiently traverse large datasets in the presence of a hierarchical memory model.

By contrast, Neo4j (to take an example) has a steeper hill to climb. Both in terms of not having had to spend decades trying to compete with Oracle, and in terms of being implemented in a less-than-ideal language for chasing raw performance.

link

AndrewBowman 2459 days ago

That compares against Neo4j 1.9.4, released in 2013. All technologies in question have improved much since then, especially graph db technology, efficiency, and speed, so I don't think that paper has as much relevance anymore. Would love to see a more updated comparison.

link