Hacker News new | ask | show | jobs
by samus 748 days ago
Current models can already argued to have something like working memory by storing information in little-used parts of the tokens. If placeholder tokens are handed to them that they can use as working memory, performance improves.

https://openreview.net/forum?id=2dnO3LLiJ1

https://news.ycombinator.com/item?id=40329675