Hacker News new | ask | show | jobs
by jf- 2567 days ago
I’m impressed by the method of mapping higher dimensional vectors to a consistent tree representation, but I’m not sure what the take home point is after that. The BERT embeddings are (possibly randomly) branching structures? I’m only eyeballing figure 5 here, but the BERT embeddings only approximate the dependency parse tree to the same extent that the random trees do.
2 comments

Figure 5(c) illustrates the shape of the projections for a random branching embedding of the correct tree structure. This roughly matches the ideal Pythagorean embedding, and also the BERT embedding. Keep in mind that BERT only sees word sequences, with no explicit notion of a tree structure. In theory, there are O(N^3) possible parse trees, which are not completely arbitrary graphs, but rather have have a context-free structure. Thus figure 5(d) is too weak, with embeddings are picked completely randomly, with no tree-based constructive process. I wish there were figure 5(e) showing random branching embedding of a random parse tree, to give a sense of how much randomly embedding the right parse tree vs. randomly embedding some random parse tree influences the final result. The hard problem in parsing in finding the right tree...
A huge question in NLP is how the discrete symbolic structures that we think characterize natural language can be embedded in high-dimensional continuous space. This paper proposes a solution that justifies some ad-hoc results from before and could form the basis for better means of embedding in the future.