|
|
|
|
|
by plonk
1458 days ago
|
|
I learned them as a generalisation of CNNs where the data is stored in any graph, CNNs being a special case where the graph has a grid structure. Convolutions spread information from nodes to their neighbours. It's just implemented differently because the data isn't a neat 4D array anymore. In that way, GNNs solve graph problems, the same way CNNs solve image processing problems. Training the GNN is more of an optimisation problem. Edit: maybe graph theory can help with training on very large graphs but I don't know enough about that. |
|
The transformer comment is interesting. They're very close, and in general, the tricks people use elsewhere are getting translated to GNNs: convolution, attention, ... . But scaling is still happening, so recent couple of years have gotten folks doing 1M-1B level, but not yet LLM scale yet. Critically, the scaling work is relatively recent -- good GPU impl, good samplers, etc -- and with a good trajectory.
We tracked GNNs for years but stayed away until heterogeneity + scaling started to get realistic for commercial workloads, and that's finally happening. Major credit to folks like deepmind, michael bronstein, early graphsage / jure, various individual researchers, and now aws+nvidia engineers for practical engineering evolutions here.