| I am really interested in GNN in the context of compilers. * Predicting the color of a node in a graph, could be used for example speculative devirtualization. * Predicting edges weight could give us a better estimate of hot basic blocks statically. * Running performance experiments is as easy as running the benchmark and introducing some metric of performance which you can give back to the GNN to learn from. Imagine also for debugging and IDEs. I haven't played with copilot, but I imagine that something like guessing the graph based on node labels and on some edges is feasible using GNN? This means that the IDE could try to match the name of your variables, the control flow, and the name of the function to other known functions and could potentially point out differences. Potentially giving us a way to fix logic errors, or better algos. E.g., "Mmm... it looks like you are implementing your own bubble sort from scratch. Would you like to click here and use <insert better sort>." I am not an expert on GNN, but if anyone has resources for someone to learn more about the state of the art of GNNs (a link to a literature review or something similar) do let me know. |
Copilot is based on OpenAI Codex, which is based on GPT-3, which is a transformer model.
Although technically, transformers are mostly GNNs that are "fully connected" (in the graph theory sense), I don't think that supports your speculation here about how GNNs could be used for code analysis since the "tokens" that GPT-3 is trained on are not programming-language syntactic constructs, but sub-word units obtained from natural language (something like WordPiece).
I will say though, I am equally excited by the exact prospect you raised of using something like GNNs for code analysis.
My hunch is that if somebody can figure out a way to make training hierarchical/graph based neural networks very fast, we'll observe the same gains that we did with transformers. But hierarchical/graph based models don't lend themselves to efficient computation.