Hacker News new | ask | show | jobs
by silencedogood3 1487 days ago
Neat! Can you explain what the KNn is doing? I can’t quite follow the paper.
1 comments

It's a sparse attention scheme. They store and reuse activations thus "memorising" the past without the need for training. In order to keep the sequence short enough to fit into memory they only recall the k most similar memories from a much larger log.