Hacker News new | ask | show | jobs
by plonk 1458 days ago
I learned them as a generalisation of CNNs where the data is stored in any graph, CNNs being a special case where the graph has a grid structure. Convolutions spread information from nodes to their neighbours. It's just implemented differently because the data isn't a neat 4D array anymore.

In that way, GNNs solve graph problems, the same way CNNs solve image processing problems. Training the GNN is more of an optimisation problem.

Edit: maybe graph theory can help with training on very large graphs but I don't know enough about that.

2 comments

Yeps, that's a GCN, and probably the most popular GNN is indeed a RGCN :)

The transformer comment is interesting. They're very close, and in general, the tricks people use elsewhere are getting translated to GNNs: convolution, attention, ... . But scaling is still happening, so recent couple of years have gotten folks doing 1M-1B level, but not yet LLM scale yet. Critically, the scaling work is relatively recent -- good GPU impl, good samplers, etc -- and with a good trajectory.

We tracked GNNs for years but stayed away until heterogeneity + scaling started to get realistic for commercial workloads, and that's finally happening. Major credit to folks like deepmind, michael bronstein, early graphsage / jure, various individual researchers, and now aws+nvidia engineers for practical engineering evolutions here.

Coming from a position of knowing nothing about graph neural networks, this is the best description I've seen so far in this thread.