Hacker News new | ask | show | jobs
by lmeyerov 2307 days ago
Re:classification... afaict, only in the rare case of complete graphs. You can extract those.. but they're dull. The use of rainbow coloring IS indicative of what folks do for classification & motif finding already on real-world graphs and even tables:

-- Spectral methods: Similar to looking at the top eigen values for PCA, they score the connectivity matrix. I haven't seen a great explanation of this, but the Microsoft Defender talk at BlueHat last year shows a decent place where to use it.

-- Counting types in weakly connected components of a property graphs. Most graphs connect already-classified entities, and tons of methods let you get components out of them. You can get quite far just by making a vector like "cluster 3 has: two reds, three yellows". For a security graph, something like, "3 different IPs, and two alerts of type X". Matt Swann's talk at last year's https://www.graphtheplanet.com/ on a MS internal security team's scalable approach is great here. (We're about to release the last wave of tickets to this year's GraphThePlanet, if you're in SF next week!).

-- Tables: At the same Microsoft BlueHat event as the spectral methods talk, I demoed how to understand the structure behind a log alert dump via a simple & automatic 'hypergraph transform' that lets you use any of these methods even on regular data tables and logs: https://twitter.com/Graphistry/status/1189966458844930048 (jump to 12:45). This is how ~half of our users use Graphistry, and a generalization of what Matt does.

There are a bunch of papers on 'motif mining', and in deep learning, handling graph-y domains via random walks, and they often come down to a variant of #2 above. For deep learning, I'm curious about combining w/ #3 to go from 2010's era ~perception-on-pixels -> 2020 era ~behavior-on-traces.

1 comments

Just re spectral methods on graphs: There are a number of really interesting interpretations of spectral methods including energy stabilization of spring networks and discrete analogues to heat diffusion over grids. The area is really fascinating.

One general interpretation is that the eigenvalues of the adjacency & other matrices that capture the connectivity of the graph are solutions to optimization problems [0]. Usually, these optimization problems and their solutions correspond to relaxations of interesting combinatorial features or operations on graphs - e.g. different ways of cutting graphs to yield binary partitions on the nodes (spectral clustering [1]).

[0] https://en.wikipedia.org/wiki/Min-max_theorem

[1] https://en.wikipedia.org/wiki/Spectral_clustering