| HN Mirror

FWIW, we do a lot of 'GPU visual graph analytics and investigation for X' work at Graphistry, where X is hooking into either graph DBs (neo4j, ...) or doing as a virtual / on-the-fly layer over other data systems (Splunk, jupyter notebooks, ...). Almost all of our user's graph projects have ended up involving text search, and as part of that, search indexes. Think security, fraud, genetics, etc. I can only think of a few exceptions that did not need text, such as blockchain viz. I just sort of assume text fields as part of linking data nowadays. In fact, a lot of our recent work is going to the next level, where we use ML algs to compute over text to infer even fuzzier connections, vs simple ID/string/regex matching from the older days of graph tech.

So at least for domains where people want to make correlations over data such as a logs, events, transactions, CSVs, etc., I encourage dgraph folks to watch discussions of text closely.

Fun recent example that illustrates this: For ProjectDomino.org (COVID anti-misinfo), we started by ingesting the covid twitter firehose into a graphdb for easy and fast pivoting by tweet/account/etc. However, our analysts need to search by text, and a lot of our current work is now doing ML/graph algorithms to mine the text to infer fuzzy edges: GPU BERT, GPU UMAP, ... . Neo4j supports setting up various text indexes which helps search, but for analytics, we end up having to extract the data out of the DB, infer relationships & scores, and put them back in.