| HN Mirror

the embedding layer is the layer that converts the one hot word feature in to a continuous multi dimensional vector that the deep net can learn with. they used to pretrain that layer separately with word2vec. now as it's just a neural net layer, they let the translation model train it with backprop on the main (translation /dialog / qa, etc) task as a regular layer