|
|
|
|
|
by pcovington
3573 days ago
|
|
Figure 3 illustrates that the variable sized watch history is combined with an average operation. This is partially why the embeddings need to be so large - in order to retain information after averaging, you need lots of dimensions to spread out disparate items. This is of course not optimal, as the network should be able to learn how best to summarize the sequence. In the paper, however, we emphasize the importance of withholding certain sequential information from the classifier. |
|