|
|
|
|
|
by d4l3k
3866 days ago
|
|
Yeah, a lot of what you're saying I agree with. In the several months since I wrote that I've changed my mind on a lot of things. The new design uses a sharded architecture (on hash of topic id) with nodes having a specific keyspace. This makes it much more robust and allows for actual consistency. Since all data will be treated equally, there will no longer be a penalty for less used data. The main reason I originally thought that less used data should be penalized, is that it takes up a lot of resources for things that aren't used by the majority of users. However, that's a hard thing to track and makes it difficult to propagate inserts. As for triples vs property graphs, they're functionally equivalent. I'm using triples because they've been shown to work quite well at scale such as Google's massive Knowledge Graph. |
|
In the SPARQL world that is actually quite interesting as different systems have very different data layouts while maintaining the same basic query language.
Comparing to the top post. I think triple systems are much more scalable than Neo4J even if not as popular. There are a few triple systems with a trillion node benchmarks. Even more with a 100 billion plus. Neo4J has at most ~34 billion relationships, and no more than 274 billion triples. Those are hard limits per current Neo4J documentation. But I have not heard of any Neo4j systems in production at that scale. While I know of at least one SPARQL system that is running with 4 trillion edges (http://allegrograph.blogspot.ch/2015/11/allegrograph-news-no...).