| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by blackbear_ 1487 days ago
	> On benchmarks including code and mathematics, we find that the model is capable of making use of newly defined functions and theorems during test time. Train on test, improved performance on test. Wow.

2 comments

visarga 1487 days ago

> Wow.

Transformers are very limited in the size of the attention window. They can take a few thousand tokens at maximum. But your data might not fit into the window, and you also don't want to have to fine-tune the model. This paper offers a solution.

link

spullara 1487 days ago

It isn't being trained on test. Kind of the point of memory is that you can change the memory at will and don't need to train on new information you have never seen before.

link