|
|
|
|
|
by boto3
3573 days ago
|
|
Could you elaborate on "we learn high dimensional embeddings for each video in a fixed vocabulary and feed these embeddings into a feedforward neural network." So, each video is mapped to fixed size vector of floats? A user's history is now a matrix of size [number of videos, embedding size]? What are the other parameters in this sentence "Importantly, the embeddings are learned jointly with all other model parameters through normal gradient descent back propagation updates."? And how do you concatenate all these into a "wide layer" when users would have histories of different length? |
|
This is of course not optimal, as the network should be able to learn how best to summarize the sequence. In the paper, however, we emphasize the importance of withholding certain sequential information from the classifier.