| HN Mirror

Figure 5(c) illustrates the shape of the projections for a random branching embedding of the correct tree structure. This roughly matches the ideal Pythagorean embedding, and also the BERT embedding. Keep in mind that BERT only sees word sequences, with no explicit notion of a tree structure. In theory, there are O(N^3) possible parse trees, which are not completely arbitrary graphs, but rather have have a context-free structure. Thus figure 5(d) is too weak, with embeddings are picked completely randomly, with no tree-based constructive process. I wish there were figure 5(e) showing random branching embedding of a random parse tree, to give a sense of how much randomly embedding the right parse tree vs. randomly embedding some random parse tree influences the final result. The hard problem in parsing in finding the right tree...