Hacker News new | ask | show | jobs
by lucidrains 1487 days ago
have an implementation of this over at https://github.com/lucidrains/memorizing-transformers-pytorc..., for any researcher exploring retrieval and memory with attention networks
2 comments

Dude your repo’s are great, marvellous code quality too for cutting edge papers. Keep it up!
hey thanks! :^) hope someone makes the next big discovery with them
Neat! Can you explain what the KNn is doing? I can’t quite follow the paper.
It's a sparse attention scheme. They store and reuse activations thus "memorising" the past without the need for training. In order to keep the sequence short enough to fit into memory they only recall the k most similar memories from a much larger log.