Hacker News new | ask | show | jobs
by tmalsburg2 412 days ago
Isn't this exactly the point of this model? No need to memorize everything (which makes transfomers expensive), just keep the relevant info. SSM are essentially recurrent models.
1 comments

You can't always know what will be "relevant info" in the future. Even humans can't do this but whenever that's an issue, we just go back and re-read, re-watch etc.

None of these modern recurrent architecture have a way to do this.

How often do you go back an rewatch earlier parts of a movie? I hardly ever do this. In the cinema, theater, or when listening to the radio it’s simply impossible and it still works.
You are mentioning avenues that are largely for entertainment. Sure you might not go back to re-attend for those. If you will be tested or are doing research, are you really looking at a large source once ?
It’s do easy to come up with serious non-entertainment examples, I‘m sure you don’t need my help finding them.