DegDB, an open-source distributed graph database

Y	Hacker News new \| ask \| show \| jobs

	DegDB, an open-source distributed graph database (github.com)
	55 points by cydrobolt 3867 days ago

5 comments

barakm 3866 days ago

Hey, maintainer of Cayley ( https://github.com/google/cayley ) here. Glad to see more people interested in this space!

It looks like your code is mostly stubbed out right now -- using gorm for any persistence. You'll find that storing a basic graph is easy; doing more complex traversals and optimization is a harder problem.

I can appreciate your concept of experimenting with adding monetary incentives on top of requests and datasets. Graphs can be a useful for this, if you view a graph as a (requestable) URI of triples.

I'd recommend linking in Cayley as your backend (you can use it as a library), and dealing with the requests/economics as an API layer on top. The benefit of open source is you don't have to reimplement everything yourself.

And if you have novel notions on how to distribute a graph that could be interesting, feel free to ping me. I warn you that it's a hard problem and bold claim in a number of ways -- it's not something you just build without working with a couple people.

link

d4l3k 3866 days ago

Hey, I'm the developer for degdb. I'm really glad to see all the interest in this project, even in its very rough initial state.

I'm aware that gorm isn't a great option for graph storage but seemed to be the easiest way of handling data storage initially. A lot of this project was written at a hackathon in ~36 hours but I've been refactoring.

I looked at Cayley (and have it as a dependency in an attempt to borrow the Gremlin parser). However, it doesn't seem to have a great way to store "metadata". How would you recommend adding fields to triples such as language, author, creation date, and cryptographic signature? Serializing and shoving them into Quad.Label seems kinda hacky.

link

amitport 3866 days ago

Nice. I've implemented basically the same thing about 5 years ago: https://code.google.com/p/graphpack/ (future dev on https://github.com/amitport/graphpack)

never had time to publish proper documentation though.

you should also check out the paper that was published a few years after: http://onlinelibrary.wiley.com/doi/10.1002/spe.2226/abstract

link

d4l3k 3866 days ago

That's really neat. Any reason you decided to abandon work on it?

Non pay-walled version: http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2...

link

amitport 3865 days ago

As @barakm said in the comment above, "it's not something you just build without working with a couple people"...

It requires a lot of work to get to a "real" product which includes documentation, basic website, traction with users, etc,. I eventually got side-tracked with life and other work. I will be very happy to continue work on it if someone will be willing to actively help out.

link

marknadal 3866 days ago

Interesting proposition, but the majority of its design is flawed. Responding to the design doc:

Truples are common in graph databases, however both gun and Neo4J use more property graphs (while Neo4j has mandatory edge nodes, gun does not), which in my personal opinion is actually useful while triples are more of an academic thing (note here, I am biased because I am the author of gun). He chucks conflict resolution up to some fairly nondeterministic behavior that will ultimately require a lot of gossip, which then makes resolution hard and untimely. He also suggests that less popular content should be charged more, which I think worsens problems that already exist in things like Bittorrent, not mitigates them.

link

d4l3k 3866 days ago

Yeah, a lot of what you're saying I agree with.

In the several months since I wrote that I've changed my mind on a lot of things. The new design uses a sharded architecture (on hash of topic id) with nodes having a specific keyspace. This makes it much more robust and allows for actual consistency. Since all data will be treated equally, there will no longer be a penalty for less used data.

The main reason I originally thought that less used data should be penalized, is that it takes up a lot of resources for things that aren't used by the majority of users. However, that's a hard thing to track and makes it difficult to propagate inserts.

As for triples vs property graphs, they're functionally equivalent. I'm using triples because they've been shown to work quite well at scale such as Google's massive Knowledge Graph.

link

jerven 3866 days ago

In the long having a triple based API does not mean you need to have a single triple table based storage.

In the SPARQL world that is actually quite interesting as different systems have very different data layouts while maintaining the same basic query language.

Comparing to the top post. I think triple systems are much more scalable than Neo4J even if not as popular. There are a few triple systems with a trillion node benchmarks. Even more with a 100 billion plus. Neo4J has at most ~34 billion relationships, and no more than 274 billion triples. Those are hard limits per current Neo4J documentation. But I have not heard of any Neo4j systems in production at that scale. While I know of at least one SPARQL system that is running with 4 trillion edges (http://allegrograph.blogspot.ch/2015/11/allegrograph-news-no...).

link

marknadal 3866 days ago

Great thoughts. I was really impressed with how sharp your thinking was in your original document, just disagreed with its direction. Interesting to hear that you've thus revised things. Want to do a skype or something on this stuff? Shoot me an email mark@gunDB.io .

link

jerven 3866 days ago

Nice to see. Is SPARQL support planned? I am wondering because it has a triplestore directory.

In the meantime a more interesting production ready open source distributed graph database is worth looking at: https://www.blazegraph.com/. It scales really well and will soon have GPU support for graph traversals. It has tinkerpop and SPARQL support.

link

kajecounterhack 3866 days ago

https://github.com/google/badwolf this is another similar project being developed at Google.

Badwolf has a SPARQL-like query language too.

link

jerven 3866 days ago

BadWolf is interesting in its temporal aspect. But IMHO has dropped a bit to much from the RDF world to really take off.

Also I paid the price in early adoption of SPARQL/RDF. Not looking to repeat that with an even earlier adoption of a non standard system. Especially, if the temporal aspect does not appear in the data I work with.

link

crudbug 3866 days ago

Thanks for the Blazegraph link.

link

wilsonfiifi 3866 days ago

Before I clicked the actual link I was secretly praying it would be written in Go so I could comb through the source code and understand its inner workings! Thanks for sharing this and making it open so us lesser experienced systems devs can learn!

link