Hacker News new | ask | show | jobs
by marknadal 3872 days ago
Interesting proposition, but the majority of its design is flawed. Responding to the design doc:

Truples are common in graph databases, however both gun and Neo4J use more property graphs (while Neo4j has mandatory edge nodes, gun does not), which in my personal opinion is actually useful while triples are more of an academic thing (note here, I am biased because I am the author of gun). He chucks conflict resolution up to some fairly nondeterministic behavior that will ultimately require a lot of gossip, which then makes resolution hard and untimely. He also suggests that less popular content should be charged more, which I think worsens problems that already exist in things like Bittorrent, not mitigates them.

1 comments

Yeah, a lot of what you're saying I agree with.

In the several months since I wrote that I've changed my mind on a lot of things. The new design uses a sharded architecture (on hash of topic id) with nodes having a specific keyspace. This makes it much more robust and allows for actual consistency. Since all data will be treated equally, there will no longer be a penalty for less used data.

The main reason I originally thought that less used data should be penalized, is that it takes up a lot of resources for things that aren't used by the majority of users. However, that's a hard thing to track and makes it difficult to propagate inserts.

As for triples vs property graphs, they're functionally equivalent. I'm using triples because they've been shown to work quite well at scale such as Google's massive Knowledge Graph.

In the long having a triple based API does not mean you need to have a single triple table based storage.

In the SPARQL world that is actually quite interesting as different systems have very different data layouts while maintaining the same basic query language.

Comparing to the top post. I think triple systems are much more scalable than Neo4J even if not as popular. There are a few triple systems with a trillion node benchmarks. Even more with a 100 billion plus. Neo4J has at most ~34 billion relationships, and no more than 274 billion triples. Those are hard limits per current Neo4J documentation. But I have not heard of any Neo4j systems in production at that scale. While I know of at least one SPARQL system that is running with 4 trillion edges (http://allegrograph.blogspot.ch/2015/11/allegrograph-news-no...).

Great thoughts. I was really impressed with how sharp your thinking was in your original document, just disagreed with its direction. Interesting to hear that you've thus revised things. Want to do a skype or something on this stuff? Shoot me an email mark@gunDB.io .