Hacker News new | ask | show | jobs
by hrbigelow 2104 days ago
One idea in the Hopfield Networks is All You Need paper, was that the softmax-based attention mechanism is equivalent to a Hopfield energy update, and in which the attention keys are the Hopfield "memories". But, the keys are produced as a transformation of the input, so it seems to me, the Transformer does not actually store keys as "memories" the way a Hopfield network stores memories (as energy minima). Is this correct, or am I missing something about the paper?