Hacker News new | ask | show | jobs
by benitorosenberg 2229 days ago
The reason for not doing that is the bias that such sampling introduces.

We are writing a paper out of this, but the main point is that you can achieve these two things with minimal classification performance degradation:

1. Speeding up node embedding and classification. 2. Speeding up whole graph embedding and classification.

1 comments

Can you speak a little more about how those work? I understand word embeddings conceptually. And I can imagine using a similar process to embed the arbitrary data stored in a graph. Embedding an entire graph makes less sense to me, unless 'entire graph' means a subgraph of the general population.

I do social network stuff occasionally. If I hypothetically could create an embedding representation of everyone, I could imagine it might be useful to, say, TSNE it all as opposed to a force layout for viz. Or maybe run it as a pretty black box prediction input? Wondering if I'm missing something more obvious here

Entire graph embedding means that you have a lot of smaller graphs (e.g. molecules, transactions, threads) and you want to classify them. We created this package which covers these methods:

https://github.com/benedekrozemberczki/karateclub