Hacker News new | ask | show | jobs
by 1024core 3504 days ago
I read the full post, thanks for writing it. It is very clear, but I do have a couple of questions:

1. In Step (2), Bidirectional RNN: what are you making the forward/backward passes over? How do the tokens get turned into a "matrix" ? What is the dimensionality of this matrix?

2. Step 3 is a bit unclear. Where do Parikh et. al. get their 2 matrices from?

It would be nice to bring in some concreteness: talk about sentences, documents, etc. and how they map into this scheme.

Thanks!

1 comments

The implementation and papers are probably much clearer about the details. This post might also help: https://explosion.ai/blog/spacy-deep-learning-keras

I'll answer briefly about the Parikh et al model.

1) Input: (ids1, ids2). These are integer-typed arrays of length len1 and len2

2) sent1 = embed(ids1); sent2 = embed(ids2). Data is now real-value arrays of shape (len1, vector_dim) and (len2, vector_dim) respectively. 300 is a common value for vector_dim, e.g. from the GloVe common crawl model.

3) sent1 = encode(sent1); sent2 = encode(sent2). Data is now real-valued arrays of shape (len1, fwd_dim+bwd_dim), (len2, fwd_dim+bwd_dim).

4a) attention = create_attention_matrix(sent1, sent2). This is a real-valued array of shape (len1, len2)

4b) align1 = soft_align(sent1, attention); align2 = soft_align(sent2, transpose(attention)). These are a real-valued array of shape (len1, compare_dim), (len2, compare_dim)

4c) feats1 = sum(map(compare(sent1, align2))); feats2 = sum(map(compare(sent2, align1))). These are real-valued arrays of shape (predict_dim,), (predict_dim,)

5. class_id = predict(feats1, feats2)

The post describes steps 4a, 4b and 4c as a single operation that takes the two 2-dimensional sentence representations as input and outputs a single vector (obtained by concatenating the representations feats1 and feats2 in this description).