|
|
|
|
|
by mike_hearn
309 days ago
|
|
I'm really not a PyTorch expert so this is most likely a newbie error, but could someone explain to me the code in Figure 7? The code circled as "4 x emb_dim" doesn't seem to apply a 4x multiplier anywhere. Actually, the layer definitions of fc1 and fc2 in the SwiGLU variant appear to be identical to the code in the regular feed forward block. What is making the two layers in the second code snippet different sizes to fc1 in the first? |
|