Hacker News new | ask | show | jobs
by Cieplak 3843 days ago
They mention using Neo4j instead of a relational database. Recently I needed to implement a friendship data model, and started off using Neo4j because it seemed like the best tool for the task, but ended up using essentially the following model in Postgres:

    CREATE TABLE friendship (
     person1 INT NOT NULL,
     person2 INT NOT NULL,
     PRIMARY KEY (person1, person2),
     CHECK (person1 < person2)
    );
    CREATE INDEX friendship_person2 ON friendship(person2);


    CREATE VIEW friendship_view AS
    SELECT person1 AS person, person2 AS friend 
    FROM friendship
    UNION
    SELECT person2 AS person, person1 AS friend FORM friendship;
taken from here: http://www.postgresql.org/message-id/20141111201127.d80b6bc4...

What do you folks think? It seems like this would be the densest way to store the relationship, using the checked constraint trick with the IDs so you don't need duplicate records to store Alice and Bob's friendship as {A -> B, B -> A} if the friendship is inherently bidirectional (vs a followed/follower model), and doesn't preclude one from storing the data in another isomorphic structure that might be optimized for specific queries. I'm sure it's possible to replicate this with Neo4j, but it felt cumbersome programming that in Java versus expressing that in SQL. But I'd really like to use Neo4j in an OLTP environment, but still haven't had enough of an impetus yet because WITH RECURSIVE in postgres works well if the recursion depth is capped.

1 comments

Your `friendship` table is a simple adjacency list representation for an undirected graph. As you point out, the main trouble in using a relational database for a true graph model is to process graph traversals, which you can do with nested inner queries, joins, or Postgres' WITH RECURSIVE. A graph database, for that reason, is usually optimized as a very efficient sequential JOIN machine, where you can imagine successive JOINS as connecting the endpoints of edges along a graph path.

By the way, if you're interested in graph representations for efficient querying, check out TripleStores [1], and the Wiki list on subject-predicate-object databases [2].

[1] https://en.wikipedia.org/wiki/Triplestore [2] https://en.wikipedia.org/wiki/List_of_subject-predicate-obje...

Thanks! I was just reading the neo4j source and noticed the bitmap indexes which apparently have several advantages to B-tree indexes for indexing graphs:

https://github.com/neo4j/neo4j/tree/3.0/community/lucene-ind...

ps:

http://stackoverflow.com/questions/9541541/b-tree-vs-bitmap-...

https://en.wikipedia.org/wiki/Bitmap_index

Thanks for showing me triplestores! Good to know that it's better to use a database engine optimized for triples for graphs vs rolling your own in SQL.