|
|
|
|
|
by hrbigelow
2104 days ago
|
|
One idea in the Hopfield Networks is All You Need paper, was that the softmax-based attention mechanism is equivalent to a Hopfield energy update, and in which the attention keys are the Hopfield "memories". But, the keys are produced as a transformation of the input, so it seems to me, the Transformer does not actually store keys as "memories" the way a Hopfield network stores memories (as energy minima). Is this correct, or am I missing something about the paper? |
|