Hacker News new | ask | show | jobs
by enzotar 2240 days ago
Can anyone share their production experience with Dgraph?
7 comments

Wasn't at a huge scale but for one project another intern and I took a proof of concept that another engineer had done with Gremlin [1] and turned it into a full tool and ended up using dgraph. The python bindings were easy to work with and Ratel (the UI/web frontend) made quick searches and tests easy.

I liked working with it so now I'm the package maintainer for it on AUR [2]. At some point I'd like to make a repo showing how to implement common graph algorithms with the python bindings, since GraphQL+- currently only supports k-shortest path at the query level [3].

[1] https://tinkerpop.apache.org/gremlin.html [2] https://aur.archlinux.org/packages/dgraph-bin/ , https://aur.archlinux.org/packages/dgraph-git/ [3] https://discuss.dgraph.io/t/how-about-doing-some-graph-compu...

Thanks for maintaining Dgraph on AUR. I'm a fan of Arch Linux.

I think the latest release is 20.03.1, perhaps time to update?

Sure thing!

Will do, I've been a little lax the past few weeks but my school semester just finished so now I should have more time again.

I'm building a product that does graph analytics on top of DGraph.

Some constraints we have: * We ingest what some may consider a lot of data - on the order of terabytes a day. This can be 10's of thousands of writes per second.

* We need transactional logic.

* We want to analyze that data as it comes in, so think 10x reads for every write.

GraphDBs out there didn't seem like they would cut it. I eliminated almost every database due to:

* Bad licensing

* Incapable of scaling writes horizontally, or generally anding tons of rights

* No ACID transactions

Most graphdbs out there had at least two of these issues.

DGraph so far has worked really well. We aren't sending it the full load of data yet, so there's still a question around that write load, but at least it's designed for that, and initial numbers have been promising.

The fact that it's liberally licensed, has a really good pricing model, has strong community support, good docs, etc, has made me glad I chose it.

The roughest part is probably the query language, because it's bespoke and therefor ends up having weird unexpected behavior sometimes. Now that it supports GraphQL that should be less of an issue.

I have been developing a POC using dgraph for the last several months. I can't really comment on its robustness at production load since it was only used for local development.

But I can comment that getting the "right syntax" was at times extremely frustrating. It has a lot of "there is just one way to do it, and you have to spend a month reading our code to find it" kind of thing going on. It is definitely "beta" software in that regard, and the ease of use of its query language (languages) is abysmal. The documentation is also extremely confusing and incomplete, to say the least. Needs a lot more examples and a lot more "ways to skin a cat" than currently documented.

On a good note, the support provided via discuss.dgraph.io is really good, even though there are so many people struggling to make it do simple things - that support forum will likely be a place where the answer can be found, or someone can answer (rather quickly) with some help.

We now have a dedicated technical writer on the team, whose sole job is to improve our technical documentation. If you have suggestions, would be great if you could jot them down at [1], so we can improve the documentation.

[1]: https://discuss.dgraph.io.

Somebody in my organization got very interested in Dgraph in its 0.7 or 0.8 days. The version was marked as "production ready", but it was an absolute trainwreck.

We were modelling individuals and contacts between them, and the cluster would constantly break with dataset sizes that should have been easily managed. There was clearly something wrong in the storage engine, because we saw insane disk space usage. Dgraph consumed 10s of TB for something that should have taken < 100 GB.

We were one of the largest installations at the time, and were working with the core development team, but they were never able to resolve the issues.

We eventually had to tell management that there was no way we'd be able to operate the thing given its disk space consumption rate, so we had to delay project delivery to rip out Dgraph and replace it with Postgres.

Surely it's better today, but I'll never use it again by choice.

I don't agree with this high size database gap.

Dgraph is built for performance and with one of our app, we faced similar challenges. After reading some documentation and watching some of their videos, I got to know they compromise space against performance when we have lots of index. We reduced index from 35 to 8 and the db size got drastically low.

I believe you should investigate that as well to check if it's wrong in your database architecture design.

For your gap of <100 GB against 10,000 GB. I assume, you probably created lots of index. Just create good database design and reduce index, you will have low size high performance app.

v0.7? That was Dec 2016. A lot has changed in 3.5 years.
I love working with Dgraph but I hate DBA-related work. As a result I only use it for local projects. I’ve once experienced the data becoming inaccessible and being unable to restart the docker containers (which sadly happened before I began to regularly export data, but luckily with data that wasn’t too important), but I’ve otherwise only had positive experiences (micheldiz is great).

I’d probably use it for pet projects if they made it easier to automate backups to the cloud, and I’d use it for all projects if they offered a hosted solution.

Something is in the works around a managed service!
My experience is that it’s a memory hog. There’s a ton of tiny caveats that are easy to overlook until you spend a good amount of time debugging an issue. The team and community is active and helpful. The overall experience is positive and the technology nifty.
We are doing PoCs around it -- however the text search is not ready for prime-time. https://github.com/dgraph-io/dgraph/issues/5102
FWIW, we do a lot of 'GPU visual graph analytics and investigation for X' work at Graphistry, where X is hooking into either graph DBs (neo4j, ...) or doing as a virtual / on-the-fly layer over other data systems (Splunk, jupyter notebooks, ...). Almost all of our user's graph projects have ended up involving text search, and as part of that, search indexes. Think security, fraud, genetics, etc. I can only think of a few exceptions that did not need text, such as blockchain viz. I just sort of assume text fields as part of linking data nowadays. In fact, a lot of our recent work is going to the next level, where we use ML algs to compute over text to infer even fuzzier connections, vs simple ID/string/regex matching from the older days of graph tech.

So at least for domains where people want to make correlations over data such as a logs, events, transactions, CSVs, etc., I encourage dgraph folks to watch discussions of text closely.

Fun recent example that illustrates this: For ProjectDomino.org (COVID anti-misinfo), we started by ingesting the covid twitter firehose into a graphdb for easy and fast pivoting by tweet/account/etc. However, our analysts need to search by text, and a lot of our current work is now doing ML/graph algorithms to mine the text to infer fuzzy edges: GPU BERT, GPU UMAP, ... . Neo4j supports setting up various text indexes which helps search, but for analytics, we end up having to extract the data out of the DB, infer relationships & scores, and put them back in.

(author of Dgraph) We want to improve full text search, to bring it inline with Elastic Search. A lot of people compare Dgraph against Elastic, because they'd rather just have one solution (Dgraph) instead of two.

It's in our backlog to improve FTS drastically from where it stands today.

do you have any reusability of the infrastructure for indexing edge properties to reuse in FTS?
I'd similarly been evaluating a Python client implementation a while back and found the developer experience a little rough around the edges[1].

It's reassuring to see Dgraph undergoing the full Jepsen treatment, even if it highlights that there's still a bit of work to do, and further stability to prove.

[1] - https://github.com/dgraph-io/pydgraph/issues/94