| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by boto3 3573 days ago
	Could you elaborate on "we learn high dimensional embeddings for each video in a fixed vocabulary and feed these embeddings into a feedforward neural network." So, each video is mapped to fixed size vector of floats? A user's history is now a matrix of size [number of videos, embedding size]? What are the other parameters in this sentence "Importantly, the embeddings are learned jointly with all other model parameters through normal gradient descent back propagation updates."? And how do you concatenate all these into a "wide layer" when users would have histories of different length?

1 comments

pcovington 3573 days ago

Figure 3 illustrates that the variable sized watch history is combined with an average operation. This is partially why the embeddings need to be so large - in order to retain information after averaging, you need lots of dimensions to spread out disparate items.

This is of course not optimal, as the network should be able to learn how best to summarize the sequence. In the paper, however, we emphasize the importance of withholding certain sequential information from the classifier.

link

bearzoo 3573 days ago

Have you experimented with replacing the averaging operation on the vectors with a recurrent network such as an LSTM. This way you can not ignore the temporal nature of the feedback (I have had success improving metrics doing this on implicit streaming video feedback).

link