| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lucidrains 1487 days ago
	have an implementation of this over at https://github.com/lucidrains/memorizing-transformers-pytorc..., for any researcher exploring retrieval and memory with attention networks

2 comments

knrz 1487 days ago

Dude your repo’s are great, marvellous code quality too for cutting edge papers. Keep it up!

link

lucidrains 1487 days ago

hey thanks! :^) hope someone makes the next big discovery with them

link

silencedogood3 1487 days ago

Neat! Can you explain what the KNn is doing? I can’t quite follow the paper.

link

visarga 1487 days ago

It's a sparse attention scheme. They store and reuse activations thus "memorising" the past without the need for training. In order to keep the sequence short enough to fit into memory they only recall the k most similar memories from a much larger log.

link