| HN Mirror

Too late to edit this, but I should clarify: "data structure" is probably not the right word. To be more precise, "SSA form" is a property of variables in a program IR. It means variables are assigned only once, that defs dominate uses, and value flow is merged at control flow merge points with phi nodes. You can have different program representations that all represent values in SSA form, but differ in how they represent other things. Where the "sea of nodes" representation differs is that it explicitly represents control dependencies. In LLVM, you always have a control flow graph, with basic blocks and edges between them. Control dependencies between instructions are implicit from their placement in particular basic blocks. In a "sea of nodes" IR, there are no basic blocks.[1] Control dependencies are represented explicitly with control inputs to nodes, just as data dependencies are represented explicitly with data inputs.

This makes certain things easier in a "sea of nodes" IR. Normally, during optimization you don't have to worry about maintaining a legal schedule of instructions within and between the basic blocks. You just have to respect the control dependencies. However, in order to get executable code you have to impose a schedule on the nodes, whereas with a more conventional CFG IR, you already have a schedule in the form of the basic blocks and the ordering within them.

[1] See Section 2.2-2.4 of Click's paper: http://paperhub.s3.amazonaws.com/24842c95fb1bc5d7c5da2ec735e.... His IR replaces basic blocks with Region nodes. The only instructions that must be linked to Region nodes are ones that inherently have a control dependency. E.g. an "If" node takes a data input and produces two control outputs, which can be consumed by region nodes. "Phi" nodes must also have control inputs, so they can properly associate different control flow paths with the data values that are merged along those paths.