Hacker News new | ask | show | jobs
by remexre 896 days ago
Should the line

    Z_encoder_decoder = layer_norm(Z_encoder_decoder + Z)

in Decoder step 7 instead be

    Z_encoder_decoder = layer_norm(Z_encoder_decoder + Z_self_attention)
? Also, is layer_norm missing in Decoder step 8...