Hacker News new | ask | show | jobs
by chintan 2237 days ago
We are doing PoCs around it -- however the text search is not ready for prime-time. https://github.com/dgraph-io/dgraph/issues/5102
3 comments

FWIW, we do a lot of 'GPU visual graph analytics and investigation for X' work at Graphistry, where X is hooking into either graph DBs (neo4j, ...) or doing as a virtual / on-the-fly layer over other data systems (Splunk, jupyter notebooks, ...). Almost all of our user's graph projects have ended up involving text search, and as part of that, search indexes. Think security, fraud, genetics, etc. I can only think of a few exceptions that did not need text, such as blockchain viz. I just sort of assume text fields as part of linking data nowadays. In fact, a lot of our recent work is going to the next level, where we use ML algs to compute over text to infer even fuzzier connections, vs simple ID/string/regex matching from the older days of graph tech.

So at least for domains where people want to make correlations over data such as a logs, events, transactions, CSVs, etc., I encourage dgraph folks to watch discussions of text closely.

Fun recent example that illustrates this: For ProjectDomino.org (COVID anti-misinfo), we started by ingesting the covid twitter firehose into a graphdb for easy and fast pivoting by tweet/account/etc. However, our analysts need to search by text, and a lot of our current work is now doing ML/graph algorithms to mine the text to infer fuzzy edges: GPU BERT, GPU UMAP, ... . Neo4j supports setting up various text indexes which helps search, but for analytics, we end up having to extract the data out of the DB, infer relationships & scores, and put them back in.

(author of Dgraph) We want to improve full text search, to bring it inline with Elastic Search. A lot of people compare Dgraph against Elastic, because they'd rather just have one solution (Dgraph) instead of two.

It's in our backlog to improve FTS drastically from where it stands today.

do you have any reusability of the infrastructure for indexing edge properties to reuse in FTS?
I'd similarly been evaluating a Python client implementation a while back and found the developer experience a little rough around the edges[1].

It's reassuring to see Dgraph undergoing the full Jepsen treatment, even if it highlights that there's still a bit of work to do, and further stability to prove.

[1] - https://github.com/dgraph-io/pydgraph/issues/94