Hacker News new | ask | show | jobs
by happylion0801 1629 days ago
> If you have extra information about vertices you can use it to generate coordinates

Could you expand on this with an example. Or is there a reference or algo that I could look at. What do you mean by having extra info to generate coordinates. I am currently looking into graph drawing algorithms for a network viz problem and part of it involves placing nodes in certain places and I am interested to see if what you are saying is something I could use

1 comments

See the above example notebook

If node or edge attribute information is useful for clustering, such as not just account IDs but say entity or event sizes & time, you may want to see sub-cluster separation based on that. A naive initial UMAP approach (I think the OP's intuition) is to ignore your graph structure and run UMAP (or t-SNE, PCA, k-means, ...) on your independent entities. No graph data involved (initially). Then you will get a scatterplot of entities where those with similar values across their features are near each other. The notebook I linked shows how to compute the UMAP on such a table. Then, instead of showing a scatter plot, you can then draw edges between neighboring entities and play with it: it's an entity similarity graph! That's the more interesting part of the notebook, and a big step up over how we see most UMAP workflows go.

However, it sounds like you already have a graph, which is more structure for clustering than out-of-the-box UMAP (PCA, ...) leverages. When you also have a graph of edges between those entities, such as causal events, and maybe even weights, you can also combine the similarity graph + yours, and run the graph clustering on the combined result. The clustering would then use both the node feature similarity and your existing graph knowledge. The UMAP similarity edges are likely more correlative & speculative than your physical graph's edges, so when running graph clustering, it often helps to assign different edge weights based on your confidence in each edge.

There are more tricks you can play here. A big one for property graphs is propagating node/edge features to nearby nodes (ex: "total connected bytes/events/$/etc.") so that the UMAP has surrounding graph information available. This starts to generalize to graph neural nets (b/c you're doing label propagation), and something we're actively looking into and are happy to chat w/ folks about: feel free to email or swing by our Slack :)

Really appreciate the response, I have not used UMAP before, thanks for giving some of the background and context. Looked at graphistry and it looks awesome, I think I do have some use cases where it will be useful :)