| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by danenania 2460 days ago
	“It should be emphasized that graph databases can do all other types of databases (relational, document, key/value, etc.)” Not to knock graph dbs, but isn’t the reverse also true?

4 comments

mumblemumble 2460 days ago

Yes. And it may even be the best way to do it. For example, here's a paper where the authors come up with a schema and transpiler for doing a Gremlin-queryable graph DB in PostgreSQL, and find that it outperforms Neo4j and Titan:

https://static.googleusercontent.com/media/research.google.c...

link

gbear0 2460 days ago

That's interesting, but kinda makes sense since it would be optimizing specific access patterns by translating it to relational models rather than using the normal graph walking algs to find relations.

As an anecdote for one project, while trying to speed up some neo4j queries myself, I decided to model a binary tree structure in the nodes (child/parent relations) and then compared the query times for using the simple cypher queries vs cypher queries with some embedded lib functions that would walk the tree exactly the way I wanted. The times were much faster for something that I hadn't even optimized much code wise.

The test got me thinking that if I could have a way of declaring more info about how the relationships are related, then maybe we could automatically have the db use more appropriate algs for a more appropriate data structure for certain node types. I think that's similar to what's happening here, it's automatically mapping out the simple graph relations to structured relational db tables. I hope in the future we'll be able to provide more input to that as well (or at least I haven't seen something like this yet). Let me annotate my schema to specify my parent/child relationship as a tree, or my word map nodes form a trie but the leaves should be some other type. Why can't we think of a db as having multiple datatypes beyond just a kv-store, or table, or graph?

link

mumblemumble 2460 days ago

It's been a while since I read the paper in detail, but, IIRC, it _is_ using normal graph walking algorithms. They're just implemented in SQL.

That it's implemented on top of a relational database seems like a red herring to me. The relational model just defines operations on sets of tuples. A graph is just a particular kind of thing you can construct with sets and tuples.

From there, the query planner and execution engine take over, and an incumbent RDBMS's query planner and execution engine are supported by decades and decades worth of accumulated dark knowledge on how to optimize execution plans and efficiently traverse large datasets in the presence of a hierarchical memory model.

By contrast, Neo4j (to take an example) has a steeper hill to climb. Both in terms of not having had to spend decades trying to compete with Oracle, and in terms of being implemented in a less-than-ideal language for chasing raw performance.

link

AndrewBowman 2459 days ago

That compares against Neo4j 1.9.4, released in 2013. All technologies in question have improved much since then, especially graph db technology, efficiency, and speed, so I don't think that paper has as much relevance anymore. Would love to see a more updated comparison.

link

lmkg 2460 days ago

One of the key features of graph databases is to select one node, and then recursively 'chase' edges until you find another node matching some criteria. Other database models can have trouble representing chasing an unbounded number of edges. E.g. in the relational model, following an edge to another node is usually represented as a Join operation, and SQL doesn't let you parameterize the number of iterated joins. This is especially true if the thing you want to query is actually the path length.

In an HN thread from a few days ago, someone made the claim that the graph model could be represented by SQL + recursion, and recursive SQL is an extension offered by some databases. But the relational model itself cannot fully represent the graph model.

Without digging too deep, I suspect other database models run into similar problems. E.g. a document store could very easily represent a Directed Acyclic Graph as a document, but when you get into general graphs your document needs to end on a value that is the key to another graph.

This is not agree with the claim that graph databases are generally superior. I like them, and they're fun, and I think more developers should be aware of them for cases where they apply, but I also don't think they have advantages over relational or document stores when the data is natively table-shaped or DAG-shaped.

link

henryfjordan 2460 days ago

You absolutely can use recursive SQL to build a graph in a relational database, I've done it. You make a table with a primary key and some data and then make another table that represents edges between objects in the first table. Then, like you describe, you can use recursive queries (built into Postgres) to query your graph. You end up with an adjacency list which might not be the most efficient way to represent your graph, but it works well enough up to a certain size.

I don't quite get what you mean by "the relational model itself cannot fully represent the graph model". If you can store all the edges of a graph and query them, what can't be done?

Graph Databases, depending on the underlying storage model, can have performance increases over relational databases when the JOINs are what you are interested in. This is because JOINs in SQL are often O(n) or O(log n) (if you index the join) where N is the number of rows in the target table, but following a relationship in a graph might be O(1) if you store an object with pointers to other objects right there. Writes might be more expensive though.

It all comes down to use case

link

danenania 2460 days ago

It seem similar to the SQL vs. NoSQL dichotomy in that it's a question of general ad hoc queries vs. predetermined queries. SQL lets you query anything with or without an index, but you pay for this flexibility with limits on performance and scalability.

In the case of graphs, if you know the exact shape of your graph and its query patterns ahead of time, you can design the optimal structure in a KV/NoSQL store. But if you need/want the flexibility to do ad hoc queries of the graph in a reasonably performant way, that's where the graph db shines.

link

ithkuil 2460 days ago

A rather long write-up but gives some context about why it's hard to build a graph data model as a layer on top of commodity non-graph databases: https://blog.dgraph.io/post/why-google-needed-graph-serving-...

(Obviously, the underlying storage layer of a graph db will use some sort of simpler storage layer, usually some kind of key value store)

link

namelosw 2460 days ago

Some databases design like extremely simple key-value databases cannot efficiently express joining relation unless load all the data in memory. The same could be said for column based databases etc. I guess that's quite a difference.

link

danenania 2460 days ago

"key-value databases cannot efficiently express joining relation unless load all the data in memory"

With the right design can't you store the graph data across multiple keys and then load pieces of it selectively into memory? I get that a graph db specializes in this pattern and makes it more efficient, easier to query, etc., but that doesn't mean it's the only type of db that can model a graph.

link

namelosw 2460 days ago

Technically it could be, just loading all the keys in a value, which could be massive.

The design of relational databases indicates the database is responsible to do a lot of logic according to the query language. Graph databases usually also include their own query language in order to do the same thing.

For example, from trillions of records there are 100 records with field x equals to value y. And the job is to get all the 100 records instead of trillions of them. The database would take a short query and interpret the predicate logic in the database process, instead of sending the data back to client which usually located in another physical machine.

link