| HN Mirror

See the above example notebook

If node or edge attribute information is useful for clustering, such as not just account IDs but say entity or event sizes & time, you may want to see sub-cluster separation based on that. A naive initial UMAP approach (I think the OP's intuition) is to ignore your graph structure and run UMAP (or t-SNE, PCA, k-means, ...) on your independent entities. No graph data involved (initially). Then you will get a scatterplot of entities where those with similar values across their features are near each other. The notebook I linked shows how to compute the UMAP on such a table. Then, instead of showing a scatter plot, you can then draw edges between neighboring entities and play with it: it's an entity similarity graph! That's the more interesting part of the notebook, and a big step up over how we see most UMAP workflows go.

However, it sounds like you already have a graph, which is more structure for clustering than out-of-the-box UMAP (PCA, ...) leverages. When you also have a graph of edges between those entities, such as causal events, and maybe even weights, you can also combine the similarity graph + yours, and run the graph clustering on the combined result. The clustering would then use both the node feature similarity and your existing graph knowledge. The UMAP similarity edges are likely more correlative & speculative than your physical graph's edges, so when running graph clustering, it often helps to assign different edge weights based on your confidence in each edge.

There are more tricks you can play here. A big one for property graphs is propagating node/edge features to nearby nodes (ex: "total connected bytes/events/$/etc.") so that the UMAP has surrounding graph information available. This starts to generalize to graph neural nets (b/c you're doing label propagation), and something we're actively looking into and are happy to chat w/ folks about: feel free to email or swing by our Slack :)